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Abstract 

Compressed sensing is a new data acquisition paradigm enabling universal, simple, and reduced-cost acquisition, by exploiting 
a sparse signal model. Most notably, recovery of the signal by computationally efficient algorithms is guaranteed for certain 
randomized acquisition systems. However, there is a discrepancy between the theoretical guarantees and practical applications. In 
applications, including Fourier imaging in various modalities, the measurements are acquired by inner products with vectors selected 
randomly (sampled) from a frame. Currently available guarantees are derived using a so-called restricted isometry property (RIP), 
which has only been shown to hold under ideal assumptions. For example, the sampling from the frame needs to be independent 
and identically distributed with the uniform distribution, and the frame must be tight. In practice though, one or more of the ideal 
assumptions is typically violated and none of the existing guarantees applies. 

Motivated by this discrepancy, we propose two related changes in the existing framework: (i) a generalized RIP called the 
restricted biorthogonality property (RBOP); and (ii) correspondingly modified versions of existing greedy pursuit algorithms, 
which we call oblique pursuits. Oblique pursuits are guaranteed using the RBOP without requiring ideal assumptions; hence, the 
guarantees apply to practical acquisition schemes. Numerical results show that oblique pursuits also perform competitively with, 
or sometimes better than their conventional counterparts. 

Index Terms 

Compressed sensing, oblique projection, restricted isometry property, random matrices. 

I. Introduction 

A. Compressed Sensing 

Many natural and man-made signals admit sparse representations Compressed sensing is a new paradigm of data 
acquisition that takes advantage of this property to reduce the amount of data that needs to be acquired to recover the signal 
of interest. Unlike the conventional paradigm, in which large quantities of data are acquired, often followed by compression, 
compressed sensing acquires minimally redundant data directly in a universal way that does not depend on the data Q-Q. 

The model for the acquisition is formally stated as the following linear system: Let / e (where K = M or K = C) be 
the unknown signal. The measurement vector y e K™ obtained by sensing matrix A e K™'^'^ is 

y = Af + w 

where w e K"' denotes additive noise. In the conventional paradigm, arbitrary signal / e K'^ is stably reconstructed when 
the rows of A constitute a frame for K'', which requires redundant measurements (m > d). In contrast, compressed sensing 
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aims to reconstruct signals that are (approximately) s-sparse over a dictionary D e K"*^" (cf. |[3j, ^) from compressive 
measurements (to < d). Let x e K" be the coefficient vector of / over D such that / Dx with x being s-sparse{^ Then, 
the composition = AD can be viewed as a sensing matrix for x that produces the measurement vector y. Once an estimate 
X of a; is computed, Dx provides an estimate of the unknown signal /. Hence, we may focus on the recovery of sparse x. 

In an ideal case with exact sparse signal model and noise-free measurements, if any 2s columns of 5* are linearly independent, 
the unknown s-sparse x is recovered as the unique solution to the linear system '^x = y ||6)-||8|. In typical examples of 
compressed sensing (e.g., 5* is a matrix with independently and identically distributed (i.i.d.) Gaussian entries), this is often 
achieved with m = 2s. However, this algebraic guarantee only shows the uniqueness of the solution. Furthermore, it is only 
vaUd in the absence of measurement noise and no error in the sparse signal model. 

In practice, both computational cost of signal recovery, and its robustness against noise and model error are of interest. For 
certain matrices the unknown x is stably recovered using efficient algorithms from compressive measurements. The required 
number of measurements for a stable recovery is quantified through a property of "if called the restricted isometry property 

(RIP) ID- 

Definition 1.1: The s-restricted isometry constant ^s(^) of is defined as the smallest 5 that satisfies 

{I- 5)\\x\lsk\\^x\l^{l + 5)\\x\l, Vs-sparse x. (1.1) 

Matrix ^ satisfies the RIP of order s if < c for some constant c e (0, 1). Intuitively, smaller implies that '^^'^x is 

closer to x for all s-sparse x. Although, in general, the recovery of s-sparse x from compressive measurements is NP hard even 
in the noiseless case, the recovery can be accomplished efficiently (in polynomial time) and with guaranteed accuracy, when ^> 
satisfies the RIP with certain parameters (order and threshold). Such results are among the major achievements of compressed 
sensing theory. For example, when 52s{'^) < — 1, the solution to an £i-norm-based convex optimization formulation 



provides a good approximation of the unknown s-sparse x |10|. The approximation error in this result is guaranteed to be 



small, and vanishes in the noiseless case. A computationally efficient alternative is provided by iterative greedy algorithms 
|[TT|-|14|, which exploit the RIP of ^ to compute an approximation of x. These iterative greedy algorithms provide similar 



approximation guarantees when (5fcs(^) < c, where k e {2,3,4} and c e (0,1) are constants specified by the algorithms. 
Different applications of the RIP require different values for the parameters k and c. Henceforth, we assume that k and c are 
arbitrarily fixed constants as above. 

The question of feasibility of compressed sensing then reduces to determining whether, and with how many measurements, 
satisfies the RlPj^ Certain random matrices ^I* e K™*^" satisfy (5s (^') < S with high probability when the number of 



measurements to satisfies m = 0{S^^s\n°' n) for some small integer a p7|-| 20 1. This result, when combined with the 
aforementioned RIP-based guarantees of the recovery algorithms, enables "compressive sensing" (m < d). For example, if 
satisfies the strong concentration property, that is, H^'xlj is highly concentrated around its expectation for all x, then 

'When w is assumed arbitrary, tlie model error term A{f — Dx) can be absorbed into w. Alternatively, x can be assumed approximately sparse. We 
consider the former case in this paper. 

- There also exist analyses not in terms of the RIP (e.g., fisl, However, these analyses only apply to certain ideal random matrices such as an i.i.d. 

Gaussian matrix, which although reasonable in models for regression problems in statistics, is rarely used in practical acquisition systems. 
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< S holds with m = 0((5^^s ln(n/s)) jlTj . In words, a number m of measurements that is proportional to the number 
s of nonzeros, and only logarithmic in the number n of unknowns, suffices for stable and computationally efficient recovery. 
This celebrated result of compressed sensing has been extended to the case where A satisfies the strong concentration property 
with ds{A) < S and D satisfies the RIP, stating that Ss{AD) < 6s{D) + S + 6 ■ Ss{D) holds with m = 0{6^'^sln{n/s)) pi) . 
Now, the RIP of D is often relatively easy to satisfy. Recall that the role of D is to provide a sparse representation of /. 
Although redundant D (with n > d) performs better in this respect, it is often the case that / is sparse over a D that is a basis 
(e.g., a piecewise smooth signal / over a wavelet basis D). In this case, ds{D) is easily bounded using the condition number 
of D. Furthermore, if D is an orthonormal basis, then 6s{D) = for any s ^ n. As for the strong concentration property 
of A, it is satisfied by an i.i.d. Gaussian or Bernoulli matrix |jT7|. This has been extended recently to any matrix satisfying 
the RIP with certain parameters, when postmultiplied by a random diagonal matrix of +1 f2S\. When implementing such a 
sensing system is technically feasible, it would provide a sensing matrix A that admits efficient computation p3) . 

However, although the aforementioned random matrix models are interesting in theory, they are rarely used in practice. In 
most practical signal acquisition systems, the linear functionals used for acquiring the measurements (rows of A) are determined 
by the physics of the specific modality and by design constraints of the sensor In compressed sensing applied to these systems 
Q, | [24| , the sensing matrix A does not follow the aforementioned random matrix models; instead its rows are i.i.d. samples 
from the uniform distribution on a set that constitutes a frame in K'^l^ 

To describe the sensing matrix more precisely, we recall the definition of a frame pSj . We denote by L2{^, v) the Hilbert 
space of functions defined on a compact set Vl that are square integrable with respect to a probability measure v on Vl, and by 
^2 the d-dimensional Euclidean space. 

Definition 1.2: Let /i denote the uniform probability measure on a compact set fl. Let {4>i^)i^fzn be a set of vectors in K''. 
Let $ : L2{Vl,ii) £2 be the synthesis operator associated with (</'tj)cj60 defined as 

<^>h= I (l)^h{uj)dfi{Lu), V/ieL2(r!,/i), (1-2) 
Jo 

with its adjoint $* : £2 ^ L2{^, /i), which is the corresponding analysis operator given by 

($*/)H = <0^,/>, ycoen, \ffe£i (1.3) 

Then, {(pu:)^^^ is a frame, if the frame operator $$* satisfies a ^ Ainin('i'^*) < A,nax($$*) ^ P for some positive real 
numbers a and /3. In particular, if the frame operator $$* is a scaled identity, then {(j>uj)LjEn is a tight frame. 

Let ly he a probability measure on fl. Let z denote the complex conjugate of z e C and [to] denote the set {!,..., m}. The 
sensing matrix A e K™^'' is constructed from a frame {4>ij)uien as 

Ak,e = -^WJ~i, Vfc 6 [m], ^ 6 [d] (L4) 

VTO 

' The use of the i.i.d. sampling may end up with a repetition of the same row. However, repeating one row of A as an additional row does not increase 
the RIC of A. A similar construction of A, where the rows are selected from a frame using the Bernoulli sampling, has also been studied |4|, |18 |. While 
the Bernoulli sampling does not cause the repetition, the size of selection is no longer deterministic, i.e., it is concentrated around m with high probability. 
The imperfection with these two sampling schemes becomes negligible as the size of A increases. We focus on the i.i.d. sampling scheme in this paper. 



for random indices {uJk)™^i in 51 chosen i.i.d. with respect to i'. We call this type of matrix a random frame matrix. It is the 
model for a sensing matrix of primary interest in this paper, and we will assume henceforth that A is defined by (??). 

Random frame matrices arise in numerous applications of compressed sensing. We list a few below. For simplicity, they are 
described for the ID case. 

Example 1.3: An important example of a random frame matrix is a random partial discrete Fourier transform (DFT) matrix. 
Let 0^ ^ [1, e-i^""^, e-J27r(d-i)a;-|T defined for uj e fl ± {1/d, ...,{d- l)/d, 1}. In this setup, u : fl ^ [0,1] is Si 
cumulative density function on Q and ^^(w) denotes the probability that u) will be chosen, multiplied by d. Then, an m x d 
random partial DFT matrix is constructed from {(l)u:)uiEn using by (??). The frame {(j>u)uien in this example is a tight frame, 
and sup^ ||0t^||fd /l^^l^d, which will play a role in our subsequent discussion, achieves its minimum Sensing matrix of 
this kind arise in practical applications of compressed sensing such as the multi-coset sampling and spectrum-bUnd recovery of 
multiband signals at sub-Nyquist rates j?), p6) , | 27) ^Similar random matrices also arise in more recent studies on compressed 



sensing of analog signals p9|-p2|. 

Example 1.4: One author of this paper proposed the compressive acquisition of signals in Fourier imaging systems Q, ||8], 
p3], which is one of the works that invented the notion of compressed sensing. This idea has been applied with refinements 



to various modalities such as magnetic resonance imaging (MRI) p4) , |34|, photo-acoustic tomography |35|, radar p6| , radar 
imaging p7) , p8) , and astronomical imaging p9) , etc. The sensing matrix A for compressed sensing in Fourier imaging 
systems is a random partial Fourier transform matrix with continuous-valued frequencies {continuous random partial Fourier 
matrix, henceforth), which is obtained similarly to the previous example. Let (f)^ = [1, e^^^'^'^, . . . , e^^^'^^'^^^^'^]'^ be defined 
for Lj e fl = [— |, |). The frame {(f)Lj)Ljen in this example is a continuous tight frame, and the quantity sup^ ll'^i.c;!^!*./!'/'^^!^^ 
achieves its minimum 

Example 1.5: In MRI, the Fourier measurements are usually modeled as obtained from the input signal modified by pointwise 
multiplication with a mask A e K"^, representing the receiving coil sensitivity profile. Let A = diag(A) denote the diagonal 
matrix with the elements of A on the diagonal. Let 0^ = A* [1, e"-'^''", . . . , e"^2^('*"i)"]'^ be defined for w e = [-i, i). 
If A has no zero element, then {4'uj)ujEn is a frame that spans W^. Otherwise, {(j>u)Ljen is a frame for the subspace S of K"* 
spanned by the standard basis vectors corresponding to the nonzero elements of A. In the latter case, letting the signal space 
be S instead of K'^, we modify the inverse problem so that A constructed by (??) is a map from S to K™. Note that each 
vector in the frame is multiplied from the right by A compared to that in Example |L4| In this example, unless the nonzero 
elements of A have the same magnitudes, {(pcu)u:£n does not satisfy the two properties coming from a Fourier system (tightness 
and minimal sup^ ll^wlf^/H'^LjIlfd). Therefore, we do not restrict our interest to the Fourier case and consider a general frame 

Because random frame matrices are so ubiquitous in compressed sensing, the analysis of their RIP is of major interest. 
Although random frame matrices do not satisfy the strong concentration property, other tools are available for the analysis of 
their RIP. In particular, the RIP of a partial Fourier matrix has been studied using noncommutative probability theory p8) , 
|[T9J. The extension of this analysis to the RIP of a random frame matrix |20| enables handling a more general class of sensing 

*This was the invention of compressed sensing of analog signals. See |28 | for a survey of this early work. 
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matrices. Notably, all known analyses p8)-p()), pO) focused on the case where D corresponds to an orthonormal basis. 
These analyses also assumed either the exact isotropy property, EA*^ = Id \ 18|-|20|, or the so-called near isotropy property. 



Eyl* A — Id\\ = O(^) |j4()|. There is no alternative sufficient condition that does not require these properties. In fact though. 



these RIP analyses further extend to the following Theorem 1.6 (proved in Section IIIi, which addresses the case of = AD, 



where A is a random frame matrix and D is not necessarily an orthonormal basis, and furthermore, allows a non-vanishing 
deviation from isotropy. 

Theorem 1.6: Let A e K™^'' be a random matrix constructed from a frame {(t>uj)ujEn by (??) and let D = [di, . . . e 
j^dxn g^fjgfy j^p) < I Suppose that sup^ maxj |<0^, dj>| K. Let * = AD. Then, (5^(^') <5+ ||Eyl*^ - + 5s{D) + 
||Eyl*A - Id\\5s{D) holds with high probability for m = 0{5^'^ slir' n). 



The inequality in Theorem 1.6 indicates that 5ks{^) < c holds with high probability for m = 0(s In n) if D satisfies 



6ks{D) < f , and A satisfies ||Ey4*yl — Id\\ < |. Combined with the aforementioned RIP-based guarantees, this result again 



enables compressive sensing, when the conditions given in Theorem 1.6 are satisfied. 



B. Motivation: Failure of Guarantees in Practical Applications 

While the RIP is essential for all existing performance guarantees for compressed sensing with random frame sensing matrices, 
it turns out that this property is satisfied only under certain nonrealistic assumptions. Most notably, although compressed 
sensing has been proposed to accelerate the acquisition in imaging systems ||2), Q, p4[ and some of the most widely studied 
applications of compressed sensing to date are in such systems, the RIP has not been shown to hold for the associated sensing 
matrices in a realistic setup. More specifically, ||EA*A — Id\\ is not negligible, which makes even the upper bound on Ss{'i') 



given by Theorem 1.6 which is the most relaxed condition on deviation from isotropy known to date, too conservative to be 
used for RIP-based recovery guarantees. 

One reason for the increase 11 EA*^ — Id\\ from the ideal case is the use of a nonuniform distribution in the construction 



of A. In Examples 1.3 and 1.4 the sensing matrix A were constructed from i.i.d. samples from a tight frame {(puj)^^^!- In 
this case, if the i.i.d. sampling is done in accordance to the uniform distribution, then Eyl*A = Id. However, in practice, 
i.i.d. sampling using a nonuniform distribution is often preferred for natural signals: it is desirable to take more measurements 
of lower frequency components, which contain more of the signal energy. Therefore, acquisition at frequencies sampled non- 



uniformly with a variable density is preferred |24|. As a consequence, the exact isotropy property is violated. Depending on 



the probability distribution, ||EA*A — Id\\ is often not negligible, and even larger than 1, which renders the upper bound on 



(5s (^') in Theorem 1.6 useless. Therefore, no known RIP analysis applies to Fourier imaging applications. 



Another reason for the increase ||E^*^ — Id\\ from the ideal case is that (0a;)wesi is a not tight frame. As shown in 



Example 1.5 even in a Fourier imaging system, {<j)cj)uen can be a non-tight frame due to the presence of a mask. Furthermore, 
the application of compressed sensing is not restricted to Fourier imaging systems. The idea of compressed sensing and 
recovery using sparsity also applies to other inverse problems in imaging described by non-orthogonal operators (e.g., a 
Fredholm integral equations of the first kind). Optical diffusion tomography pT) is a concrete example of compressed sensing 



with such a scenario. As another example, the sensing matrix that arises in compressed sensing in shift-invariant spaces |29| 



is not necessarily obtained from a tight frame. 



Yet another reason for the failure of the upper bound on (5s (4') in Theorem 1.6 has to do with the dictionary D. Indeed, to 



achieve Ss{^) < c with m = 0(sln^7i), it is necessary that both ||Ey4*A — Id\\ and 6s{D) are less than a certain threshold. 
However, verification of this condition for Ss{D) is usually computationally expensive. For the special case where D has full 
column rank (hence, d 5= n), Ss{D) is easily bounded from above by \\D*D — /„||. In particular, if D corresponds to an 
orthonormal basis, then D* D = /„, which implies 6s{D) = 0. Otherwise, Ss{D) vanishes as D approaches an orthonormal 
basis. However, it is often too restrictive to make \\D*D — /„|| less than a small threshold below 1. Moreover, without this 
constraint, D can provide a better sparse representation, which is also desired for stable recovery. In particular, for a data- 
adaptive D, the property that \\D* D — In\\ is less than a given threshold is not guaranteed. In this all too common situation, all 
known RIP analyses break down: they only provide a conservative upper bound on (5s (^I*), which does enable the RIP-based 
recovery-guarantees. 

In summary, in most practical compressed sensing applications, the effective sensing matrix ^I* = AD may fail to satisfy the 
RIP for one or more of the following reasons: the i.i.d. sampling in the construction of A does not use the uniform distribution; 
the frame used in the construction of A is not tight; or the dictionary D does not have a sufficiently small restricted isometry 
constant. From these observations, we conclude that none of the existing performance guarantees for recovery algorithms 
applies to the aforementioned applications of compressed sensing. 



C. Contributions 

Recall that unlike the £i-norm-based recovery, greedy recovery algorithms were designed to exploit the property that 'i>*^x ^ 
X for sparse x, explicitly. For example, in the derivation of the CoSaMP algorithm |11|, the procedure of applying '5* to 
y = for X sparse was called the computation of a "proxy" signal, which reveals the information about the locations of 
nonzero elements of x. The same idea was also used for deriving other iterative greedy algorithms lfT2)-p4). Indeed, if 5* 
satisfies the RIP, then the use of the (transpose of) the same matrix 5* to compute a proxy is a promising approach. Otherwise, 
one can employ a different matrix to get a better proxy The required property is that 'i'^'i'x ^ x for sparse x. To 

improve the recovery algorithms in this direction, we first extend the RIP to a property of a pair of matrices ^f,^ e K""^" 
called the restricted biorthogonality property (RBOP). 

Definition 1.7: The s-restricted biorthogonality constant 9s{M) of M 6 K"'^" is defined as the smallest 5 that satisfies 

, M xy ~ (y , x)\ (5||a;||2||2/||2, Vs-sparse x,y with common support. (1-5) 

The pair satisfies the RBOP of order s if 6's($**) < c for some constant c 6 (0, Intuitively, smaller 6's($**) 

implies that '^*'^x becomes closer to x for all s-sparse x. In other words, any s columns of 5* and ^I^ corresponding to the 
same indices behave like a biorthogonal basis. If = \I', then 6's(4'*vl/) reduces to (5s(^); hence, the RBOP of (5", 4') reduces 
to the RIP of 

^As in the case of the RIP, the threshold value of c for which the RBOP is said to be satisfied depends on the application. 



We then modify the greedy recovery algorithms so that the modified algorithms employ both 5* and ^' and, in particular, 
exploit the RBOP of (^I^, '5) to provide an approximation guarantee. In fact, modified thresholding and forward greedy algorithms 

. However, our work is different from 



using a different matrix have been akeady proposed by Schnass and Vandergheynst 1 42 
theirs in several important respects. Schnass and Vandergheynst |j42l propose to use numerically optimized to minimize a 
version of the Babel function. However, although sufficient conditions given in terms of the Babel function are easily computable, 
the resulting guarantees for the recovery performance are conservative. Furthermore, their numerical algorithm to design ^l* is a 
heuristic, and does not provide any guarantee on the value of the Babel function achieved. In contrast, we propose an explicit 
construction of ^I* so that 6s{'i'*'i') « 1 holds. To show the construction, we recall the definition of a biorthogonal frame that 
extends the notion of a biorthogonal basis. 

Let 



1.2 



Definition 1.8: Let {cj)u:)ujen and {(f>cj)uien be sets of vectors in K''. Let £2(^,1^.) be as defined in Definition 
$* : £2 ^ 1^2(^1, IJ.) be the analysis operator associated to (0c^)c^eo defined in (??). Let $ : L2{^, fi) £2 be the synthesis 
operator associated to {(f),^)^^^ defined similarly to (??). Then, {4'u],4>^),^En is a biorthogonal frame if (E><I>* = Id- 

Matrix ^ is then constructed as the composition ^ = AD. We construct A e K'"^'' from the dual frame {(j>uj)u)€n by 



Im 



Je, Vfc 6 H, te [d] (1.6) 



where (wfc))!^]^ are the same indices as used to define the samples from {tpuj)LuEn in the construction of A in (??). Assuming 

EA*A 



''''-(w) > 0, then, by the construction of A and A, it follows that the pair (A, A) satisfies the dual isotropy property 



Remark 1.9: We proposed modified greedy pursuit algorithms in Section |II| that use both 5* = AD and ^ = AD and are 
guaranteed using the RBOP of (^', 4*). Therefore, it is important to check whether = D* A* can be efficiently implemented. 
The discussion on {D, D) is deferred to the next subsections and we only discuss the computational issue with A here. In 
practice, A* is implemented using fast algorithms without forming a dense matrix explicitly. For example, if yl is a partial 
DFT matrix, then, A* is implemented as the fast Fourier transform (FFT) applied to the zero padded vector. Likewise, if A is a 
continuous partial Fourier matrix, then, the nonuniform FFT (NUFFT) ([43] can be used for fast computation. In this case, since 
our construction of A in (??) only involves row-wise rescaling of A by constant factors. A* is also implemented using the 
same fast algorithms. In the more general biorthogonal case, once the synthesis operator A is implemented as a fast algorithm, 
A* is also computed efficiently using the same algorithm. In fact, in many applications, the biorthogonal dual system is given 
analytically. For example, if the frame {4)ui)uj€n is given as a filter bank system, designing perfect reconstruction filters that 
provide the corresponding biorthogonal dual frame is well studied ||44J. Similar arguments apply to the analysis operator of 
analytic frames such as overcomplete DCT or wavelet packets. 

Regarding the construction of D, we consider the following two cases: (i) D corresponds to a basis for K" {d = n); (ii) 
D satisfies the RIP with certain parameter. We let D = D{D* D)^^ for the former case and D = D for the latter case. The 
RBOP of this construction is deferred to after the exposition of new recovery algorithms. 



Now, we return to the discussion of the recovery algorithms. While Schnass and Vandergheynst |42 only replaced by ^ 
in the steps of computing a proxy in forward greedy algorithms (MP and OMP), we also replace the orthogonal projection 
used in the update of the residual in OMP by a corresponding oblique projection obtained from 4* and 'J. Therefore, we 
propose a different variation of OMP called Oblique Matching Pursuit (ObMP), which is guaranteed using the RBOP of 
(^'jV!'). We also propose similar modifications of iterative greedy recovery algorithms and their RIP-based guarantees. The 
modified algorithms are different from the original algorithms: we assign them new names, with the modifier "oblique". For 
example, SP is extended to oblique subspace pursuit (ObSP). CoSaMP, IHT, and HTP are likewise extended to ObCoSaMP, 
OblHT, ObHTP, respectively. We call these modified greedy algorithms based on the RBOP oblique pursuits. In the numerical 
experiments in this paper, in scenarios where one or more of the ideal assumptions (i.i.d. sampling according to the uniform 
distribution, tight frame {(t)ui)uiEn, or orthonormal basis D) are violated, the oblique pursuits perform better than, or at least 
competitively with their conventional counterparts. 

Importantly, the oblique pursuits come with RBOP-based approximation guarantees. In particular, similarly to its conventional 
counterpart, each iterative oblique pursuit algorithm is guaranteed when 9ks{'^*^) < c, where k e {2,3,4} and c e (0,1) 
are constants specified by the algorithms. The number of measurements required for the guarantees of oblique pursuits is also 
similar to that required in the ideal scenario by their conventional counterparts. When combined with the subsequent RBOP 
analysis of (^E*, for random frame sensing matrices, the recovery by the iterative oblique pursuit algorithms is guaranteed 
with m = 0{s In^ n). In particular, we show that it is no longer necessary to have ||Eyl*74 — « 1. Therefore, the obtained 
guarantees apply in realistic setups of the aforementioned CS applications. 

The degrees of freedom added by the freedom to design 5* make the RBOP easier to satisfy under milder assumptions than 
the RIP. In particular, with the proposed construction of ^, the RBOP of (^I^, ^) holds without requiring the (near) isotropy 
property of A. More specifically, depending on whether D corresponds to a basis or satisfies the RIP, the RIP analysis in 
Theorem 1.6 is extended to the following theorems. Recall that we proposed different constructions of for the two cases. 

Theorem 1.10: Let A,Ae K™'^'' be random matrices constructed from a biorthogonal frame (0^, (j)ui)Ljen by (??) and (??), 
respectively. Let D = [di, ...,(!„] and D e K'*''" (d = n) satisfy D* D = Id. Let = AD and $ = AD. Suppose that 
sup„ maxj \{(j)^,dj)\ K. Then, 6's($**) < 5 holds with high probability for m = 0{5^'^s\n^ n). 

Theorem 1.11: Let A,Ae K'"^'' be random matrices constructed from a biorthogonal frame (0^^, 4>u))i^EQ. by (??) and (??), 
respectively. Let D = [di, ... , dn] e K'^''" satisfy 6s{D) < 1 Let * = AD and $ = AD. Suppose that sup^ maxj |<0„, dj}\ ^ 
K. Then, 6's($*^') < S + Ss{D) holds with high probabiHty for m = 0{S^'^s n). 



Note that the upper bounds on 9s{^*'^) in Theorems 1.10 and 1 1 . 1 1 1 do not depend on [|Eyl*j4 — Id\. Therefore, unlike the 
RIP, which breaks down when the ideal assumptions, such as i.i.d. sampling according to the uniform distribution and tight 
frame, are violated, the RBOP continues to hold even with such violations. 

In summary, we introduced a new tool for the design, analysis, and performance guarantees of sparse recovery algorithms, 
and illustrate its application to derive new guaranteed versions of several of the most popular recovery algorithms. 



D. Organization of the Paper 

In Section we propose the oblique pursuit algorithms and their guarantees in terms of the RBOP. In Section III 



elaborate the RBOP analysis of random frame matrices in various scenarios. The empirical performance of the oblique pursuit 
algorithms is studied in Section IV and we conclude the paper in Section |V] 



E. Notation 

Symbol N is the set of natural numbers (excluding zero), and [n] denotes the set {1, . . . , n} for n e N. Symbol K denotes 
a scalar field, which is either the real field M or the complex field C. The vector space of d-tuples over K is denoted by 
for deH. Similarly, for to, n e N, the vector space of m x n matrices over K is denoted by K™^". 

We will use various notations on a matrix A e K""^". The range space spanned by the columns of A will be denoted by 
TZ{A). The adjoint operator of A will be denoted by A*. This notation is also used for the adjoint of a linear operator that 
is not necessarily a finite matrix. The jth column of A is denoted by aj and the submatrix of A with columns indexed by 
J c [n] is denoted by Aj. The fcth row of A is denoted by a*^, and the submatrix of A with rows indexed hy K cz [to] is 
denoted by A^ . Symbol ek will denote the fcth standard basis vector of K'*, where d is implicitly determined for compatibility. 
The fcth element of d-tuple x e is denoted by {x)j. The fcth largest singular value of A will be denoted by ak{A). For 
Hermitian symmetric A, Xk{A) will denote the fcth largest eigenvalue of A. The Frobenius norm and the spectral norm of A 
are denoted by \\A\\p and ||A||, respectively. The inner product is denoted by •). The embedding Hilbert space, where the 
inner product is defined, is not explicitly mentioned when it is obvious from the context. For a subspace S of K'*, matrices 
Ps e W^^'^ and e K.'^^'^ denote the orthogonal projectors onto S and its orthogonal complement 5^, respectively. For 
J cz [n], the coordinate projection IIj : K" K" is defined by 



(njx)i 



{x)k if fc 6 J 

(1.7) 

else. 



Symbols P and E will denote the probability and the expectation with respect to a certain distribution. Unless otherwise 
mentioned, the distribution shall be obvious from the context. 

II. Oblique Pursuit Algorithms 

In this section, we propose modified greedy pursuit algorithms that use both ^ and and show that they are guaranteed by 
the RBOP of {"ii, similarly to the way that the corresponding conventional pursuit algorithms are guaranteed by the RIP of 
^f. The modified greedy pursuit algorithms will be called oblique pursuit algorithms, because they involve oblique projections 
instead of the orthogonal projections in the conventional algorithms. 

Recall that greedy pursuit algorithms seek an approximation of signal / that is exactly sparse over dictionary D. Let x* e K" 
be an s-sparse vector such that 

X* = arg min{||/ — Dx\\2 : ||a;||o < s}. 

a;eK" 

We assume that the approximation error / — Dx* is small compared to ||/||2- 
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The measurement vector y e K™ is then given by 

y = A{Dx*) + z 

where the distortion term z includes both the approximation error A{J — Dx*) in modehng / as an s-sparse signal over D, 
and additive noise w, 

z = A{f - Dx*) +w. 

Let Dx be an estimate of / given by a greedy pursuit algorithm such that x is exactly s-sparse. Then, 

11/ - Dxh ^\\f-Dx*\\2 + \\D{x* - x)\\2 

^ \\! -Dx*\\2 + ^/l + 52s{D)\\x-X*\\2. 

Since the first term ||/ — Dx*\2 corresponds to a fundamental limit for any greedy algorithm, we will focus in the remainder 
of this section on bounding p — 2;*||2. 

To describe both the original greedy pursuit algorithms and our modifications, we recall the definition of the hard thresholding 
operator that makes a given vector exactly s-sparse by zeroing the elements except the s-largest. Formally, Hs : K" K" is 
defined by 

Hs{x) = argmin{||x — it;|| : ||it;||o ^ s}. 

Remark 2.1: All algorithms that appear in this section extend straightforwardly to the versions that exploit the structure of 
the support, a.k.a. recovery algorithms for model-based compressed sensing | [45) . The only task required in this modification is 
to replace the hard thresholding operator by a projection onto s-sparse vectors with supports satisfying certain structure (e.g., 
tree). The extension to model-based CS explicitly depends on the support and is only available for the greedy algorithms. To 
focus on the main contribution of this paper, we will not pursue the details in this direction here. 

A. Oblique Thresholding 

We start with a modification of the simple thresholding algorithm. The thresholding algorithm computes an estimate of the 
support J as the indices of the s largest entries of which is the support of Hs{'i'*y). 

Let us consider a special case, where "if has full column rank and y = 'i'x* is noise free. While exact support recovery 
by naive thresholding of is not guaranteed, thresholding of with the biorthogonal dual 'i' = (^I^^)* is guaranteed to 
provide exact support recovery. This example leaves room to improve thresholding using another properly designed matrix ^P. 
In compressed sensing, we are interested in an underdetermined system given by hence, 4* cannot have full column rank. 
In this setting, the use of the canonical dual = (^f^)* is not necessarily a good choice of ^t. 

Schnass and Vandergheynst |42) proposed a version of the thresholding algorithm that uses another matrix 5" different from 
^. We call this algorithm Oblique Thresholding (ObThres), as an example of the oblique pursuit algorithms that will appear 
in the sequel. 
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Algorithm 1: Oblique Thresholding (ObThres) 

J^supp (iJ,(**y)); 



Schnass and Vandergheynst |42 Theorem 3] showed a sufficient condition for exact support recovery by ObThres in the 
noiseless case (z = 0), given by 



where the cross Babel function /!ti(s,^,5') is defined by 



minj iV'TV'jl 2 II II CO 



Jj,i{s, — max max ^ I'fpji'j 

k \J\ — s. 



Since the left-hand side of (??) is easily computed for given 5* and ^, Schnass and Vandergheynst |42 1 proposed a numerical 
algorithm that designs 4' to minimize the left-hand side of (??). However, the minimization problem is not convex and there is 
no guarantee for the quality of the resulting ^I^. Moreover, their optimality criterion for is based on the sufficient condition 
in (??), which is conservative (see | ,42, Fig. 1]). In particular, unlike the RBOP, there is no known analysis of the (cross) Babel 
function of random frame matrices. 

Instead, we derive an alternative sufficient condition for exact support recovery by ObThres, given in terms of the RBOP 

of 

Theorem 2.2 (ObThres): Let x* e K" be s-sparse with support J* c [n]. Let y = '^x* + z. Suppose that 5* and ^ satisfy 

min|(x*),| >20,+i($**)||x*||2 + 2max||^j||2||z||2. (2.2) 
Then, ObThres will identify J* exactly. 



Compared to the numerical construction of 4* by Schnass and Vandergheynst |42 , our construction of 'J in (??) for a random 
frame matrix 5* has two advantages: it is analytic; and it guarantees the RBOP of (^I^, ^). Therefore, with this construction, 
the computation of 6g^i{'^*'^) for given ^ and ^P, which involves a combinatorial search, is not needed. 

For the noiseless case (z = 0), the sufficient condition in (??) reduces to 

^..i(^*^) < (2.3) 
2f*||2 

Even in this case though, the upper bound in (??) depends on both the dynamic range of x* and the sparsity level s. Therefore, 



compared to the guarantees of the iterative greedy pursuit algorithms in Section II-C the guarantee of ObThres is rather weak. 



In fact, the other algorithms in Section II-C outperform ObThres empirically too. However, ObThres will serve as a building 



block for the iterative greedy pursuit algorithms. 
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B. Oblique Matching Pursuit 

Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP) are forward greedy pursuit algorithms. Unlike thresholding, 
which selects the support elements by a single step of hard thresholding, (0)MP increments an estimate J of the support J* 
by adding one element per step chosen by a greedy criterion: 

fc* = argmax|(^'*(?/- 5'a:))j.| (2.4) 

where y — '^x is the residual vector computed with the estimate x of x* spanned by 5* j. 

Given the estimated support J, OMP updates the estimate x optimally in the sense that x satisfies 

X = argmin{||y — ^x\\2 ■ supp (x) c: J}. (2.5) 

Therefore, the criterion in (??) for OMP reduces to 

k* = are max I f^f* 



K(*j)yjfcl 



argmax|<F7^(^ ^V-fc, P^(^ ,)y}\, (2.6) 



which clearly describes the idea of "orthogonal matching". 

Schnass and Vandergheynst |42| proposed variations of MP and OMP that, using ^, replace (??) by 

fc* = argmax|($*(?/- (2.7) 



and provided the following sufficient condition 1 42 Theorem 4] for exact support recovery by the OMP using (??) 



Zti (s, vl/, vl/) 1 

' J ' < -. (2.8) 

As for ObThres, they proposed to use a numerically designed vj/ that minimizes the left-hand side of (??) (the same criterion 
as in their analysis of ObThres). 

As discussed in the previous subsection, while easily computable for given and 4', this sufficient condition is conservative 
and is not likely to be satisfied even when 'f is numerically optimized. Thus, the resulting algorithm will have no guarantee. 
Another weakness of the sufficient condition in (??) is that it has been derived without considering the orthogonal matching 
in OMP, and thus ignores the improvement of OMP over MP. Indeed, the same condition provides a partial guarantee of MP 
that each step of MP will select an element of the support J*, which is not necessarily different from the previously selected 
ones. 

In view of the weaknesses of the approach based on coherence, we turn instead to the RIP. Davies and Wakin [ |46| provided 
a sufficient condition for exact support recovery by OMP in terms of the RIP, which has been refined in the setting of joint 
sparsity by Lee et al. pT] Proposition 7.11]. These analyses explicitly reflect the "orthogonal matching". In particular, one 
key property required for the RIP-based sufficient conditions is that the RIP is preserved under the orthogonal projection with 
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respect to a few columns of ^, i.e., for all J cz [n] satisfying | J| < s, 

^.(^K(*,)*[„]\,7) < (2-9) 



This condition is an improvement on |46 Lemma 3.2] and was shown |47, Proof of Proposition 7.11] using the interlacing 



eigenvalues property of the Schur complement |47 Lemma A. 2] 



The objective function in the orthogonal matching in (??) can be rewritten as 

K^KC^j)^*;' ^K(*j)y>l = I Zl (^KC*,?)^*;' ^K(*j)^J>(2^*)j +<^K(*j)^'='^>|- (2.10) 

The RIP of vj/ together with (??) imply that the left-hand side of (??) is close to |(nj,yja;*)fc|, with the perturbation bounded 
as a function the RIC of Then, orthogonal matching will choose k* as 

k* = arg max |(a;*)fc|. 

k€j*\J 

This explains why orthogonal matching is a good strategy when 4* satisfies the RIP. 

The OMP using (??) by Schnass and Vandergheynst | [42) still employs the orthogonal matching. However, we are interested 
in the scenario where ^ does not satisfy the RIP but instead satisfies the RBOP with a certain ^P. Unfortunately, unlike the 
RIP of 5*, the RBOP of (^t, ^) is no longer valid when the orthogonal projection P-^j^^ is applied to both matrices. Instead, 
we show that the RBOP of (^I*, is preserved under an oblique projection, which is analogous to the RIP result in (??). To 
this end, we recall the definition of an oblique projection. 

Definition 2.3 (Oblique projection): Let V,>V cz "H be two subspaces such that V©>V^ = H. The oblique projection onto 
V along W-^, denoted by i?y yy±, is defined as a linear map yy± : H ^ H that satisfies 

1) {Ev,w^)x = X, \fx e V. 

2) {Ev^w^)x = 0, VxeW-L. 

By the definition of the oblique projection, it follows that 

I-H — -E'v.w-i- = Ew±y and E^yy± = Ey^y±. 

When V = W, the oblique projection reduces to the orthogonal projection Py onto V. 

Lemtna 2.4: Suppose that M, M e K™^'' for fc m satisfy that M*M has full rank. Then, n{M) and 7e(M)-^ are 
complementary, i.e., 7^(A^) n n{M)-^ = {0}. 



Proof of Lemma 2.4- Assume that there is a nonzero x e TZ{M) n TZ{M)^. Then, x = My for some y e K'^ and 
M*My = since x e TZ{M)^ = J\f{M*). Since M*M is invertible, it follows that y = 0, which is a contradiction. ■ 



2.4 



l,Tr* 



The RBOP of ^E*) implies that is invertible. Furthermore, TZi^ j) and 7l{"i' j) are complementary by Lemma 

Therefore, -(^tv]/ is an obHque projection onto Tli^ j) along TZ{^ j)^. It follows that E = I^-f^ - \1> j{^yi! j)'^'^ 

is an oblique projection onto TZ{^ j)-^ along TZ{^ j). 
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Lemma 2.5: Suppose that * e K™""" satisfy 

6'^($**) < 1. 

Let J c [n]. Let E = - Then, 



Remark 2.6: When = ^I^, Lemma 2.5 reduces to (??) 



Proof of Lemma 2.5 Follows directly from Lemma A. 6 in the Appendix. 
Lemma 



2.5 



suggests that if ^ does not satisfy the RIP but 'J and 5* satisfy the RBOP, then it might better to replace the 



orthogonal matching by the "oblique matching" given by 



fc* =argmax|<S*^fe, Ey)\, (2.11) 



where E is an oblique projector defined as 



To affect the appropriate modification in OMP, recall that orthogonal matching in (??) corresponds to matching each column 
of 'J with the residual y — "iix computed with a solution x to the least square problem in (??). Similarly, oblique matching is 
obtained by replacing the least square problem in (??) by the following weighted least square problem: 

X = argminljl^Ky — ^a;)l|2 : supp (x) c J}. 

X 'J 

We call the resulting forward greedy pursuit algorithm with the oblique matching oblique matching pursuit (ObMP). ObMP 
is summarized in Algorithm ^ In particular, when = ^I*, ObMP reduces to the conventional OMP. Like OMP, ObMP does 
not select the same support element more than once. This is guaranteed since the selected columns are within the null space 
of the oblique projection associated with the oblique matching. 

Algorithm 2: Oblique Matching Pursuit (ObMP) 

J^0;x^O- 
while I J| < s do 

fc* ^ arg max I (^'*(?/ — , 1; 

ke[n]\J 

J ^ Jyj {fc*}; 

X ^ argminj.{||$l(2/ - 5'a;)||2 : supp (x) c J}; 

end 



Next, we present a guarantee of ObMP in terms of the RBOP. 

Proposition 2.7 (A Single Step of ObMP): Let x* e K" be s-sparse with support J* c [n\. Let y = 'i'x* + z and J <^ J* . 
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Suppose that and 5* satisfy 

\\Ilj.\jx*\U,-20,+,i^*^)\\nj.\jx*h > I ll'^J'IIH^i'll ) 2nmx||V^,||2||0||2. (2.12) 

where the coordinate projection j is defined in (??). Then, the next step of ObMP given J will identify an element of 
J*\J. 



The following theorem is a direct consequence of Proposition 2.7 



Theorem 2.8 (ObMP): Let x* e K" be s-sparse with support J* c [n]. Let y = ^x* + z. Suppose that ^ and satisfy 

min|(x*),| f ^ }^l^-2e,^^{^-^)\ > f J'^^f I 2max||V^,||2||z||2. (2.13) 



Then, ObMP will identify J* exactly. 



If ^ = \]/^ then ObMP reduces to OMP; hence. Proposition 2.7 reduces to the single measurement vector case of |47 
Proposition 7.11], with the requirement on in (??) reduced to 

\\T^j*\jx* lU - 2<5,+i (*) \\Ilj,\jx* II2 > 2 max ||2 ||z||2. (2.14) 

J 



In fact, the proof of Proposition 2.7 in the Appendix is carried out by modifying that of |47 Proposition 7.11] so that the 



non-Hermitian case is appropriately managed. Similarly, the guarantee of ObMP in Theorem 2.8 reduces to that of OMP given 
by 

min|(x*),| ( mm J^^f^-25s+iW) > 2max HV.bH^b. (2.15) 

jeJ* \JcJ*,J#0 ||lijX*||2 / 3 

To satisfy the condition in (??), it is required that (5s+i(\I') < c for some c e (0, 1) that depends on x* . As will be shown 
this RIP condition is often not satisfied in a typical scenario of practical applications. In contrast, 6*^+1 (^'* 5*) 



in Section 



III 



is still satisfied with a properly designed ^1' in the same scenario. Therefore, the guarantee of ObMP in Theorem 
demanding than the corresponding guarantee of OMP. 



2.8 



is less 



We observe that the bound on the noise amplification in ObMP is larger by the factor ^^^^* ^^^^l^'^jj^^ than in OMP. This factor 



is an upper bound on the spectral norm of the oblique projection onto 7l{'i/ j) along TZ{'$ j) . The analogous operator in OMP 
is an orthogonal projector and the spectral norm is trivially bounded from above by 1. However, when oblique matching is 
used instead of orthogonal matching, this is no longer valid. The spectral norm of the oblique projection is the reciprocal of 
the cosine of the angle between the two subspaces TZ{'ifj) and TZ{'i'j). This result is consistent with the known analysis of 
oblique projections]^ 

For the noiseless case (z = 0), the sufficient condition in (??) reduces to 

0,+i(i*vI;)< min (2.16) 
Compared to the sufficient condition for ObThres in (??), where depending on the dynamic range of x*, the upper bound on 

^ 111 a general context, unrelated to CS, it has been shown |48| that oblique projectors are suboptimal in terms of minimizing the projection residual, which 
is however bounded within factor — ^ of the optimal error. 
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the RBOC can be arbitrary small, the right-hand side in (??) is no smaller than for any x* . Although ObMP is guaranteed 
under a milder RBOP condition than ObThres, the corresponding sufficient condition is still demanding compared to those of 
iterative greedy pursuit algorithms. 

However, ObThres and ObMP are important, since they provide basic building blocks for the iterative greedy pursuit 
algorithms. The thresholding and OMP algorithms have been modified to ObThres and ObMP by replacing two basic blocks, 
"4'* followed by hardthresholding", and "orthogonal matching", to "^P* followed by hardthresholding", and "oblique matching", 
respectively. The modifications of these two basic blocks will similarly alter the other greedy pursuit algorithms and their RIP- 
based guarantees. 

In the next section, we present the oblique versions of some iterative greedy pursuit algorithms (CoSaMR, SP, IHT, and 
HTP). However, the conversion to the oblique version of both algorithm and guarantee is not restricted to these examples. It 
applies to any other greedy pursuit algorithm that builds on these basic blocks (e.g.. Fast Nesterov's Iterative Hard Thresholding 



pursuit algorithms that iteratively update the s-sparse estimate of x* . At a high level, both CoSaMP and SP update the estimate 
of the true support using the following procedure: 

1) Augment the estimated set by adding more indices that might include the missing elements of the true support. 

2) Refine the augmented set to a subset with s elements. 

The two algorithms differ in the size of the increment in the augmentation. More important, SP completes each iteration by 
updating the residual using an orthogonal projection, which is similar to that of OMP. CoSaMP and SP provide RIP-based 
guarantees, which are comparable to those of £i -based solutions such as BP. 

Both algorithms use the basic building blocks of correlation maximization by hard thresholding and least squares problems. 
Therefore, following the same approach we used to modify thresholding and OMP to ObThres and ObMP, we modify CoSaMP 
and SP to their oblique versions called Oblique CoSaMP (ObCoSaMP) and Oblique SP (ObSP), respectively. ObCoSaMP and 
ObSP are summarized in Algorithm [3] and Algorithm |4] 

Algorithm 3: Oblique Compressive Matching Pursuit (ObCoSaMP) 
while stop condition not satisfied do 



(FNIHT) ||49)). 



C. Iterative Oblique Greedy Pursuit Algorithms 



Compressive Sampling Matching Pursuit (CoSaMP) | [TT[ and Subspace Pursuit (SP) | [T2| are more sophisticated greedy 




xt+i ^ Hs{x); 



end 



17 



Algorithm 4: Oblique Subspace Pursuit (ObSP) 
while stop condition not satisfied do 

Jt+i ^ svi-pY>{xt) u supp [Hs{^*{y--^xt))y, 

X ^ argmin ] [|$*~ {y - fx)! : supp (x) c Jt+i }; 

Jt+1 ^ supp {Hs{x)); 

xt+i ^ argniin|||$}^^^(y- ^'a;)!^ : supp (a;) c Jf+i 



Iterative Hard Thresholding (IHT) p3) and Hard Threshold Pursuit (HTP) | |T4) are two other greedy pursuit algorithms 
with RIP-based guarantees. HTP is a modified version of IHT, which updates the residual using orthogonal projection like SP. 
Since both IHT and HTP use the same basic building blocks used in the other greedy pursuit algorithms, they too admit the 
oblique versions. We name these modified versions Oblique IHT (OblHT) and Oblique HTP (ObHTP). OblHT and ObHTP 
are summarized in Algorithm |5] and Algorithm |6] Note that these iterative oblique greedy pursuit algorithms reduce to their 
conventional counterparts when vj/ = v]/. 

Algorithm 5: Oblique Iterative Hard Thresholding (OblHT) 

while stop condition not satisfied do 

xt+i^ Hs{xt + ^''iy-'^xt)); 
t<-t+l; 
end 



Algorithm 6: Oblique Hard Thresholding Pursuit (ObHTP) 
while stop condition not satisfied do 

Jt+i ^ supp (^H,{xt + *a;t))j; 

Xt+i ^ argmjn - 'i'x)\\^ : supp (x) c Jj+i 



We briefly review the currently available RIP-based guarantees of the original algorithms. The guarantees of the iterative 



greedy pursuit algorithms were provided in their original papers 1 11 1-| 14 1. In particular, Needell and Tropp, in their technical 



report on CoSaMP |50|, showed that CoSaMP (with exact arithmetic) converges within a finite number of iterations, which is 
at most 0{s) for the worst case and can be as small as O(lns). We will show that the same analysis applies to SP, HTP, and 
their obhque versions. 
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TABLE I 

The RBOP condition required for linear convergence in Theorem |2.9| 



Alg 


ObCoSaMP 


ObSP 


OblHT 


ObHTP 


< c 


6l4s(***) < 0.384 


6i3^(***) < 0.325 


03s(***) <0.5 


e'3,(1'*«') < 0.577 



The guarantees of the iterative greedy pursuit algorithms are provided by sufficient conditions given in a common form 
Sksi'^) < c, where the condition becomes more demanding for larger k and smaller c. Recently, Foucart [5lj refined the 
guarantees of CoSaMP and IHT by increasing required c. We will show that the guarantee of SP is similarly improved using 
similar techniques and replacing triangle inequalities by the Pythagorean theorem when applicable]^ 

Next, we show that the RIP-based guarantees of the iterative greedy pursuit algorithms are replaced by similar guarantees of 
the corresponding oblique pursuit greedy algorithms, in terms of the RBOP. In fact, the modification of the guarantees is rather 
straightforward, as was the modification of the algorithms. We only provide the full derivation for the RBOP-based guarantee 
of ObSP. Replacing by in the result and the derivation will provide an RIP-based guarantee for SP. The guarantees 
of the other iterative oblique pursuit algorithms (ObCoSaMP, OblHT, and ObHTP) are obtained by similarly modifying the 
corresponding results p4[ , (jSTJ. Therefore, we do not repeat the derivations but only state the results. 

Theorem 2.9: Let Alg e {ObSP, ObCoSaMP, OblHT, ObHTP}. Let {xt)tEn be the sequence generated by algorithm Alg. 
Then 

||a;t+i - x*||2 p\\xt - x*\\2 + t\\z\\2 (2.17) 

where p and r are positive constants depending on Alg, given as explicit functions of 6'fes(^*5'), 6ks{'^), and 5ks{^)- 
Moreover, p, which only depends on 9ks{^*'^), is less than 1, provided that the condition in Table |l] specified by Alg is 
satisfied. 



Proof of Theorem 2.9 ■ We only provide the proof for ObSP in Appendix |D| The formulae for p and r are provided for 
all listed algorithms. ■ 

For p <\, {11) implies that in the noiseless case the iteration converges linearly at rate p to the true solution, whereas in 
the noisy case the error at convergence is ||a;oc — x*\i = r/(l — p) ||zj|2. 

Unlike OblHT, the other algorithms (ObCoSaMP, ObSP, and ObHTP) involve the step of updating the estimate by solving 



a least squares problem. This additional step provides the property in Lemma 2.10 which enables the finite convergence of 
the algorithms. 

Lemma 2.10: Let Alg e {ObSP, ObCoSaMP, ObHTP}. Let {xt)ten be the sequence generated by Alg. Then, the approxi- 
mation error \xt — x*\2 is less than the £2 norm of the missed components of x* to within a constant factor p plus the noise 
term, i.e., 

\\xt+i - x^h ^ p\\^X^,x'h + r\\42 (2.18) 

' As an aside, inspired by tlie existing RIP analysis tliat ^^^('I') < c holds with m = 0(fcsc^^ In^ n), Foucart [SI] proposed to compare sufficient 
conditions by comparing the values of kc^^ . Nevertheless, this comparison is heuristic and only relies on sufficient conditions for the worst case guarantee. 
Therefore, it is not necessarily true that an algorithm with smaller kc^^ performs better. 
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where p and f are positive constants given as explicit functions (depending on Alg) of Oks{^*'^), <5fcs(^), and Sksi"^)- 

Proof of Lemma [ZTOl - Lemma [2.10| is an intermediate step for proving Theorem |2.9| For example, for ObSP, it corresponds 
to Lemma A. 9 in Appendix [D] For the other algorithms, we only provide the formulae for p and f in Appendix [D] ■ 
Needell and Tropp fSO) showed finite convergence of CoSaMP. The same analysis also applies to ObCoSaMP, ObSP, and 
ObHTP. To show this, let us recall the relevant definitions from the technical report on CoSaMP | [50| . The component bands 
{Bj) of X* are by 



: 2-(^+i)||a:*||2 < ^ 2-J"||x*||2}, Vj e Z u {0}. 



Then, the profile of x* is defined as the number of nonempty component bands. By definition, the profile of x* is not greater 
than the sparsity level of x* . 

Lemma 2.11 (A Paraphrase o//[50] Theorem B.l]): Let p be the profile of x* . Suppose that {xt)t£'N satisfies ????. Then, 
for 



t > L + pln ( 1 + 2 |p+ -(1 -p 



[p+ ^{i-p-v)] 



it holds that 



In 



1 



1 — 77 



(2.19) 



\\xt - X*\\2 



1- p-T] 



1-P 



The minimal number of iterations for the convergence (the right-hand side of (??)) is maximized when p = s pO). The 



following theorem is a direct consequence of Theorem |2.9[ Lemma 2.10 and Lemma 2.11 

Theorem 2.12: Let Alg e {ObSP, ObCoSaMP, ObHTP}. Suppose that Oksi^*"^) < c holds depending on Alg as in Table|l] 
After t,„ax = Ci(s + 1) iterations, Alg provides an estimate x satisfying \\x — x*\\2 ^ ^2112:112. Here, k, c, Ci, and C2 are 
constants, specified by Alg. 

The fast convergence of iterative greedy pursuit algorithms that involve the least square steps is important. When the 
problem is large (e.g., in CS imaging, the image size is typically 512 x 512 pixels), solving the least squares problems is the 
most computationally demanding step of the recovery algorithms. Empirically, as the theory suggests, the iterative algorithms 
(ObCoSaMP, ObSP, and ObHTP) converge at most within 0{s) iterations, and are even more computationally efficient than 
the non-iterative ObMP. 

Remark 2.13: The extension of greedy pursuit algorithms and their RIP-based guarantees to those based on the RBOP is 
not restricted to the aforementioned algorithms. For example. Fast Nesterov's Iterative Hard Thresholding (FNIHT) | [49) is 
another promising algorithm with an RIP-based guarantee, which will extend likewise. 

III. Restricted Biorthogonality Property 

In this section, we show that the RBOP-based guarantees of oblique pursuits apply to realistic models of compressed sensing 
systems in practice. For example, when applied to random frame matrices, the guarantees remain valid even though the i.i.d. 
sampling is done according to a nonuniform distribution. Recall that the guarantees of oblique pursuits in Section [ll] required 
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dksi'^*'^) < c where k e {2, 3, 4} and c e (0, 1) are constants specified by the algorithm in question. The noise ampHfication 
in the reconstruction for these guarantees also depend on 5ks{^) and 5ks{^)- However, unlike Oksi,^*'^), the RICs 5ks{^) and 
(5fcs(5') need not be less than 1 to provide the guarantees. In fact, as discussed later, reasonable upper bounds on 5]^s(^) and on 
5ks{'^) (possibly larger than 1) are obtained with no additional conditions whenever 6ks{'^*'^) < c is achieved. Therefore, we 
may focus on the condition 6*^,3 (^*^) < c. Also recall that the guarantees for the corresponding conventional pursuit algorithms 
require 5ks{'^) < c, for fc e {2,3,4}, c e (0, 1), with the same k and c as the corresponding oblique pursuits. To compare 
the guarantees of the oblique vs. the conventional pursuit algorithms, assuming k e {2, 3, 4} and c e (0, 1) arbitrarily fixed 
constants, we compare the difficulty in achieving the respective bounds on and Q].s{^*'^\ While both properties are 

guaranteed when m = 0(s In^n), Q}^s{^*^) < c is achieved without additional conditions required for achieving i5fcs(^) < c, 
which are often violated in practical compressed sensing. 

A. General Estimate 



We extend |20 Theorem 8.4] to the following theorem, so that it provides an upper bound on ds(^*^\ 
Theorem 3.1: Let '5, 4* e K™*^" be random matrices not necessarily mutually independent, each with i.i.d. rows with 
elements bounded in magnitude as 

K ~ K 

max|(«')fc,<>| s; — and max |(^')fc,^| s: — (3.1) 
k,e k.i Jm 



for if, > 1. Then, 61, (^f**) < J + (E^-**) holds with probabiUty 1 - j] provided that 



m ^ Ci6-^ [K^J2 + 6's(E^'**) + K^2 + 6's(E^'**) j ^(In.s)^ Innlnm, (3.2) 
m ^ C2S-'^Kmax{K, K)s Htj'^) (3.3) 



for universal constants Ci and C2. 

Proof of Theorem \3.1\ See Appendix |E] 
Letting = 5* in Theorem 3.1 provides the following corollary]^ 



Corollary 3.2: Let 5* e K™'^" be a random matrix with i.i.d. rows with elements bounded in magnitude as max^ ^ |(^')a:,^| 
^ for X 5= 1. Then, dsi'S) <6 + 0,{E^*'i) holds with probabihty 1 - 77 provided that 

Ci(5~^ii'^4[2 + 6i,(E«'*«')]s(lns)^lnnlnm, (3.4) 
C2S~^K'^slii{ri-^) (3.5) 

for universal constants Ci and C2. 



The following corollary is obtained by combining Theorem 3.1 and Corollary 3.2 applied to ^I* and to respectively. 



Corollary 3.3 aims to provide an upper bound on 6's(^*5'). It also provides upper bounds on both (5s(5') and (5s(^). 



Corollary 3.3: Let ^f, ^ e K™^" be random matrices with i.i.d. rows with elements bounded in magnitude as max^ ^ | 



A direct derivation of Corollary 3.2 might provide better constants, but we do not attempt to optimize the universal constants. 
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^ and maxk,i |($)fc,,| ^ ^ for K,K ^ 1. Then, ^?,($**) < 5 + 0,(E$*^'), < ,5 + 0,(E***), and < 

(5 + 6l5(E$*$) hold with probability 1 - ?/ provided that 

m> Ci5-Vax(if^^2)4[^2 + max(^6',(E***),6'^(E$*$)) s(lns)2 Innlnm, (3.6) 
m > C2(5~^max(is:^^2)gij^(^-i) (3 7) 

for universal constants Ci and C2. 



Corollary 3.2 and Corollary 3.3 have very different implications. Corollary 3.2 guarantees that Sksi'^) < c holds with high 
probability when m = 0(s In"*?!) if maxfe.£ |(5')fc,^| = 0{^^) and 6'fes(E5'*vl/) < 0.5c. The former condition implies that the 
rows of ^I^ are incoherent to the standard basis vectors and is called the incoherence property. As will be discussed in later 
subsections, the latter condition, Oksi^'i'*'^) < 0.5c, is often difficult to satisfy for small c e (0, 1), in particular, in practical 
settings of compressed sensing. Although this condition has not been shown to be a necessary condition for 5ks{'^) < c, 
no alternative analysis is available for random frame matrices. In contrast, 9ks{K'i'*'^) can be made small by an appropriate 



choice of ^, which by Corollary 3.3 suffices to make 9ks{'^*'^) < c. In fact, it is often the case that 'J can be chosen to make 
6'fcs(E5'*^) much smaller than 6'fcs(E5'*^), or even zero, and to satisfy the incoherence property at the same time. In this case, 
9ks{'^*'^) < c is guaranteed, whereas Sksi'i') is not guaranteed so. This key difference in the guarantees in Corollaries 



3.2 



and |3.3| establishes the advertised result that the RBOP-based guarantees of oblique pursuits apply to more general cases, in 
which the RIP-based guarantees of the corresponding conventional pursuits fail. 

In the next subsections, we elaborate the comparison of the two different approaches: oblique pursuits with RBOP-based 



guarantees vs. conventional pursuits with RIP-based guarantees (per Corollaries 3.2 and 3.3 1 in more concrete scenarios in 



which ^ is given as the composition of the sensing matrix A obtained from a frame and the dictionary D with certain properties. 

B. Case I: Sampled Frame A and Nonredundant D of Full Rank 

We first consider the case of ^E* = AD, where the sensing matrix A is constructed from a frame {(j>uj)ujEn by (??) using a 
probability measure i', and the sparsifying dictionary D is nonredundant {n ^ d) with full column rank. 



Using the isotropy property, E^*A = Id, conventional RIP analysis |20 Theorem 8.4] showed that < 5 holds with 

high probability for m = O^S^^sln^ n) under the following ideal assumptions: 

(AI-1) {(f>uj)ujEn is a tight frame, i.e., $$* = Ij_ where denotes the associated synthesis and the analysis operators. 

(AI-2) ly is the uniform measure. 
(AI-3) £)*£> = /„. 



Corollary 3.2 generalizes |20 Theorem 8.4], so that the same RIP result continues holds when the ideal assumptions are 
"slightly" violated. To quantify this statement, we introduce the following metrics that measure the deviation from the ideal 
assumptions. 
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Nonuniform distribution v: We additionally assume that v is absolutely continuous with respect to Define 

„ dv dv 
I'min — ess mi — (cli) and !/n^ax — ess sup — (w) (3.8) 

weO tjeO d^ 

where the essential infimum and supremum are w.rt. to the measure v. If 17 is a finite set, then ^(i^) reduces to the 
probability that cj e will be chosen, multiplied by the cardinality of Vl. By their definitions, i^min and z^max satisfy 
i^min 1 J^max- Notc that Vrain ^nd i^max measure how different v is from the uniform measure ji. In particular, 
Vmin = '^max = 1 if coincides with fl. 

Non-tight frame {4>uj)uj£rt'- Multiplying and y by a common scalar does not modify the inverse problem "^x = y. 
Therefore, replacing by the same matrix multiplied by an appropriate scalar, we assume without loss of generality 
that 

Ai($$*) = 1 + ""S^^^j'^ (3.9) 



and 



where denotes the condition number of Equations (??) and (??) imply 

where the first identity follows from the definition of Od- Note that = if $$* = Id- 

• Non-orthonormal D: Similarly, for nonredundant D, we assume without loss of generality that 

^ ' k{D*D) + 1 



and 



where k{D*D) denotes the condition number of D*D. Equations (??) and (??) imply 

0JD''D) = W'^D - IJ = 

"V ; II nil + l 

Note that d„{D*D) = if Z? corresponds to an orthonormal basis, i.e., D*D = /„. 



IS a 



Now, invoking Corollary 3.2 with the above metrics, we obtain the following Theorem 3.4 of which Theorem 1.6 
simplified version. Under the ideal assumptions, Kq vanishes and Theorem |3.4| reduces to [20, Theorem 8.4]. 

Theorem 3.4: Let {(p^)^En and D = [di, . . . , d„] e K'^^" satisfy sup^ maxj \((p^,dj}\ K for K ^ 1. Let A e K"'""^ 
be constructed from {cj)u:)uiEn by (??) using a probability measure and let = AD. Let t'lnin and t'max be defined in 
(??). Then, Ss{^) < S + Kq holds with probability 1 - 77 provided that m ^ Ci(l + K q)^ K'^S^'^s (In s)^ In n\nm and 
TO 5= C2K^S^^s\n{ri^^) for universal constants Ci and C2 where Ko is given in terms of t^min, t'max, ^s{D), and 6'd($$*) 

'if SI is a finite set, then is the counting measure and any probability measure u is absolutely continuous. 
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by 

Ko = max(l - i^^,^ - 1) + iym..[Ss{D) + + 6,(0) ■ (3.13) 

Proof: See Appendix |F] ■ 



Theorem 3.4 shows that the ideal assumptions (AI-1) - (AI-3) for achieving the RIP of 'i' can be relaxed to a certain extent. 
However, even the relaxed assumptions are still too demanding to be satisfied in many practical applications of compressed 
sensing. When the ideal assumptions are not all satisfied, each deviation increases Kq and the obtained upper bound on 5s{'^) 
also increases. For example, when = I^i and D* D = /„, depending on i^, the upper bound on (5s (5') may turn out to 
be even larger than 1, which fails to provide an RIP-based guarantee. As another example, when v = ^ and $$* = (the 
rows of A are obtained from i.i.d. samples from a tight frame according to the uniform distribution), 6s{D) determines the 
quality of the upper bound. Although, in general, computation of 6s{D) is NP hard, an easy upper bound on 6s{D) is given as 
SniD) = \\D*D-In\\. Now, note that Sn^D) ^ 0.6 for k{D) ^ 2. Therefore, considering that the RIP-based guarantee of HTP 
p4) requires (53s(^) < 0.57, which is the largest upper bound on (53s(^') among all sufficient conditions for known RIP-based 
guarantees. This suggests that even when the other ideal assumptions are satisfied, D needs to be near ideally conditioned. 
This strong requirement on D is often too restrictive, in particular, for learning a data-adaptive dictionary D. 

Next, we show that 6's('5*^') < c is achieved more easily, without the aforementioned restriction on $, v, or D. To this 



end, we would like to use Corollary 3.3 however, the K parameter in Corollary 3.3 requires further attention. While the 
incoherence parameter K is determined by the inverse problem, the other incoherence parameter K is determined by our own 
choice of A and D. Recall the construction of ^' = AD: matrix A e K™^'' is constructed from the dual frame {(f>uj)ujen by 
(??) using the same probability measure i/ used to construct A per (??), whereas D is given as D = D{D* D)^^ , so that 
D*D = In- It follows that K is related to $ and D, and thus to K. By deriving an upper bound on K in terms of K and 



using it in Corollary 3.3 we obtain the following theorem. 

Theorem 3.5: Let {4)oj)ojEn and D = [di, . . . , rf„] e K''^" satisfy sup^^ maxj \{(t)^^dj)\ ^ K for K 5= 1. Let be a 
probability measure on 57 such that its derivative is strictly positive. Let A,Ae K™^" be random matrices constructed from a 
biorthogonal frame 0cj)weo by (??) and (??), respectively using ly. Let ^ = AD and $ = AD where D = D{D*Dy^. 
Let fmin and z^^ax be defined in (??). Then, 9s{^i*'^) < 5, (5s(*) < 5 + Ki, and (5s($) < 5 + Ki hold with probabiUty 1 - 77 
provided that 

m ^ Ci(l + KifKl5-'^s{\nsf\rLn\nm, (3.14) 
TO ^ C2Kl5-'^ s\n{f]-^) (3.15) 

for universal constants Ci and C2, where Ki and K2 are given in terms of K, !/min, i^max, Sn{D), and 9d{^^*) by 

Ki = max(l - ly^l^, v^^l^ - 1) + max(fmax, i^^in) 
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grf($$^) <5„(g)0,(<i><i>^) 

1 - J„p) ^ 1 - + [1 - (5„(i?)][i - 9d{<^<^>*)] ( ^ ^ 



and 



Ao = 



mm 



K + sup||0„||^d • • max 11^0 



(3.17) 



Proof: See Appendix [G] ■ 
With any significant violation of the ideal assumptions (AI-1) - (Al-3), Theorem 3.4 fails to provide (5fcs(^) < c, whereas 
Theorem 



3.5 



still provides 6'fcs(^*^') < c. Therefore, the RBOP-based guarantee of recovery by oblique pursuits is a significant 
improvement over the conventional RIP-based guarantees, in the sense that the former applies to a practical setup (subset 
selection with a nonuniform distribution, non-tight frame, and non-orthonormal dictionary) while the latter does not. This is 

Instead, it increases 



3.5 



because violation of the ideal assumptions does not affect the upper bound on 6's(^'*5') in Theorem 
the upper bounds on Ss{'^) and (5s(vE'). However, in the guarantees of obUque pursuits, unlike 0j5(\l/*\E'), the restricted isometry 
constants (5^(4') and (5s (^') need not be bounded from above by a certain threshold. 



Example 3.6: We show the implication of Theorem 3.5 in a 2D Fourier imaging example. The corresponding numerical 



results for this scenario can be found in Section IV The measurements are taken over random frequencies sampled i.i.d. from 
the uniform 2D lattice grid ft with a nonuniform measure i'. The signal of interest is sparse over a data-adaptive dictionary 
D, which is invertible (n = d) and has block diagonal structure. 

More specifically, D in this example is constructed as follows. Recently, Ravishankar and Bresler p2) proposed an efficient 
algorithm that learns a data-adaptive square transform T with a regularizer on its condition number When the condition 
number of T is reasonably small, D given hy D = T^^ serves as a good dictionary for sparse representation. In particular, 
they designed a patch-based transform T that applies to each patch of the image. When the patches are nonoverlapping, T and 
D have block diagonal structure; hence, applying D and D* is computationally efficient. Furthermore, when the patches are 
much smaller than the image, each atom in D is sparse and has low mutual coherence to the Fourier transform that applies 



to the entire image. For example, D e C5i2x5i2 ^^^g^ numerical experiment in Section IV was designed so that it 

applies to 8 X 8 pixel patches. It has condition number 1.99, which implies 5n{D) = 0.60. We also observed that D satisfies 

\\{D*D)-^,.^,.=2.Vi. 

Since {(t>u:)u]Gn corresponding to the 2D DFT is tight, it follows that = 0. Therefore, the expressions for Ki and 

K2 in ???? reduce to 

Ki = max(l - v-^^^, - 1) + 2.5 max(i^max, v^,^) (3.18) 



and 



2 13 

K2 = -^K. (3.19) 



^min 



Recall that Urain and i^max in this scenario correspond to the minimum and maximum probability that a measurement is taken 
at a certain frequency component. The simplified expressions of Ki and K2 in (??) and (??) show quantitatively how the use 
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of nonuniform distribution for the i.i.d. sampling in the construction of a random frame matrix increases the required number 
of measurements. 

C. Case II: Sampled Frame A and Overcomplete D with the RIP 

The analysis in the previous section focused on the case where the dictionary D is not redundant. In fact though, the analysis 
extends to certain cases of redundant/overcomplete D. One such case is when D is, like A, a random frame matrix. Then, using 
a construction similar to our construction of A will produce a matrix D with ¥.D*D = In, which combined with EA*A = Id 
provides E^'*^!' = /„. However, usually, D is given as a deterministic matrix (e.g., concatenation of analytic bases, analytic 
frame, data-adaptive dictionary, etc). Therefore, in the general redundant D case, using the biorthogonal dual of D as is 
not a promising approach. Instead, we focus in the remainder of this subsection on the case where D satisfies the RIP with 
small 5s{D). Using $ = AD, we show the RBOP of (^P, $) in this case. 

Theorem 3.7: Let (0„)„ef2 and D = [di, . . . , d„] e K'^^" satisfy sup„ maxj \((j>^,dj}\ K for K ^ 1. Let A,Ae K™^" 
be random matrices constructed from a biorthogonal frame {4>u],(j>^),^En by (??) and (??), respectively using a probability 
measure v. Suppose that 5s{D) < 1. Let * = AD, and $ = AD. Let v^mi and t'max be defined in (??). Then, 9s{^!*^) < 
6 + dsiD), Ssi^) < S + Ki, and Ssi^) < S + Ki hold with probability 1 - ?? provided that 

m 5= Ci(l + KiyK^S-'^s{\ns)^\nn\nm, (3.20) 



m 



^ C2K^S-''slii{T]-^) (3.21) 



for universal constants Ci and C2, where Ki and K2 are given in terms of K, i^rnin, i^mux, Ss{D), and 9d{^^*) by 

Ki = max(l - ^--^1^, - 1) + max^iy^,^^, v^^^lj 

and 



Proof: See Appendix |H] ■ 

D. Case III: Sampled Tight Frame A and Orthonormal Basis D / RIP Matrix D 

In the special case where the use of a nonuniform distribution for the i.i.d. sampling in the construction of A is the only cause 
for the resulting failure of the exact/near isotropy property, the failure of the conventional RIP analysis can be fixed differently. 
Recall that the construction of A in (??) only involves the weighting of rows of a matrix obtained from the biorthogonal dual 
frame {(l)uj)u:en, with sampling at the same indices as used for the construction of A from the frame (0tj)cjeo. Therefore, for 
the special case when {(f>u:)u:En is a tight frame and D*D = In, it is possible to derive the RIP of a preconditioned version 
of 
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We construct a preconditioned sensing matrix A as 



J,, Vfce [m], ee[d] (3.22) 



where (wfc)^!^]^ are the same sampling points used in the construction of A in (??). Then, by construction, A satisfies the 
isotropy property EA*A = Id- Furthermore, if sup^ maxj \{(f>i^,djy\ K, then maxk.i |(^')fc,^| < '"'^ holds. 



In this case, it suffices to invoke |20 Theorem 8.4] to show the RIP of ^P. Invoking instead Theorem 3.4 this approach 
extends in a straightforward way to the case where D satisfies the RIP. In the case of tight frame A and D that is an 
orthobasis or an RIP matrix, these results provide an alternative (and equivalent) approach to obtain guaranteed algorithms, 
without invoking RBOP In particular, defining A as the diagonal matrix given by (A)j j = [{di>/dfj,){ujk)]^^^^ for j e [to], 
conventional recovery algorithms with an RIP-based guarantee can be used to solve the modified inverse problem A^I^ = Ay. 

As discussed earlier, non-tight frame and/or non-orthonormal or non-RIP dictionaries arise in applications of compressed 
sensing, and in these instances too the conventional RIP analysis fails. We are currently investigating whether, and if so how, 
the above approach to "preconditioned" 5* may be extended in general beyond the aforementioned cases. 

IV. Numerical Results 

We performed two experiments to compare the oblique pursuits to their conventional counterparts and to other methods. 

In the first experiment, we tested the algorithms on a generic data set. Synthesis operators $ and $ for a random biorthogonal 
frame (0^^, <j)uj)ujEn were generated using random unitary matrices U,V e M'"^" and a fixed diagonal matrix E as <& = U"EV* 
and $ = UY,^^V*. The diagonal entries of E increase linearly from to sj^. Sensing matrix A e M™^" was formed by 
TO random rows of $ scaled by where the row selection was done with respect to the uniform distribution. Then, the 
condition number of EA*j4 is 2 and the isotropy property is not satisfied. In this setting the oblique pursuit algorithms are 
different from their conventional counterparts. Signal x* e K" is exactly s-sparse in the standard basis vectors (D = /„) and 
the nonzero elements have unit magnitude and random signs. The success of each algorithm is defined as the exact recovery 
of the support. 

Figure [T] shows the empirical phase transition of each algorithm as a function of m/n and s/n. The results were averaged over 
100 repetitions. Oblique versions of thresholding and IHT showed dramatic improvement in performance while the performance 
of the other algorithms is almost the same. While the oblique pursuit algorithms can be guaranteed without A satisfying the 
isotropy property, the modification of the algorithms at least do not result in the degradation of the performance. 

In the second experiment, we tested the algorithms on a CS Fourier imaging system. The partial DFT sensing matrix A 
used in this experiment was constructed using the variable density suggested by Lustig et al. |j24j. We used a data-adaptive 
square dictionary D that applies to non-overlapping patches. Dictionary D was learned from the fully sampled complex valued 



brain image using the algorithm proposed by Ravishankar and Bresler p2) (See Example 3.6 for more detail). The resulting 
D was well conditioned with condition number n{D) = 1.99. The Oblique pursuit algorithms use ^ = AD, where D is given 
as the biorthogonal dual D = (D^^)*. Since the patches are non-overlapping, applying D, D, and their adjoint operators are 
patch-wise operations, and are computed efficiently. 
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ObThres ObMP ObCoSaMP ObSP OblHT ObHTP 

Fig. 1. Phase transition of support recovery by various greedy pursuit algoritlims (tlie liorizontal and vertical axes denote the ratio rajn of number of 
measurements to number of unJcnowns and ratio s/m of sparsity level to number of measurements, respectively): signal x* is exactly s-sparse with nonzero 
entries that are +1 with random sign, n = 1024, SNR = 30dB, k = 2. 



The input image was a phantom image obtained by s-sparse approximation over the dictionary D of an original brain 
image with sparsity ratio sjn = 0.125. Our goal in this experiment is not to compete with the state of the art of recovery 
algorithms in CS imaging system; rather, we want to check whether the oblique pursuit algorithms perform competitively with 
their conventional counterparts in a setting where the RBOP of (^1^, v]/) is guaranteed. This motivates our choice of a simplified 
test scenario. We also compare the oblique pursuit algorithms to simple zero filling, and to NESTA | |53) that solves the l\ 
analysis formulation [23]. In fact, when the original brain image is used as the input image, all sparsity-based reconstruction 
algorithms, including NESTA, performed worse than zero filling[^To get a meaningful result in this setting, we replaced the 
input image by an exactly s-sparse phantom obtained by the s-sparse approximation of the original brain image. 

TABLE II 

Quality (PSNR in decibels) of images reconstructed from noisy variable density Fourier samples with measurement SNR = 30 

DECIBELS. Results averaged over 100 random sampling patterns. 
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Zero Filling 


conventional 


14.48 


42.93 


45.24 


9.06 


40.02 


34.04 


34.73 


oblique 


37.59 


43.30 


44.79 


44.96 


45.53 


(a) Downsample by 2 
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CoSaMP 


SP 


IHT 


HTP 


£i -Analysis 


Zero Filling 


conventional 


9.34 


29.46 


34.74 


9.34 


31.58 


30.96 


31.55 


oblique 


31.13 


32.21 


36.17 


31.10 


36.26 



(b) Downsample by 3 



Table |ll] shows the PSNR of the reconstructed images using the various algorithms with different downsampling ratio. The 
error images truncated at the maximum magnitude of the input image divided by 10 are shown in Fig. |2] Downsampling by 
factors of 2 and 3 is presented, but the results for larger downsampling factor are qualitatively the same. 

In most cases, the oblique pursuit algorithms performed better than the conventional counterparts. In the few exceptions, the 
difference in performance is not significant. In particular, ObSP and ObHTP performed significantly better than zero filling. 
We observed that thresholding and IHT totally failed in this experiment. In this experiment, the step sizes of IHT was fixed 

To achieve good performance on the original image requires a more sophisticated recovery algorithm with overlapping patches, and adaptive sparsity 
level j54]. 
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ObHTP 
36.40 dB 



SP 
34.58 dB 




ObSP 
36.26 dB 




1 Analysis 
29.75 dB 




Zero Filling 
31.46 dB 



Fig. 2. Error images and PSNR for recovei'y by various algoritlims from noisy measurements (the maximum intensity of the input image is normalized as 
1): SNR = 30dB, downsample by 3. 



as 1 for its RIP-based guarantees. By employing an empirically tuned step size, the performance of IHT might be improved. 
In contrast, OblHT provided a reasonable performance with a fixed step size. 

Fig.|2]also shows that the error in the reconstruction include blocky artifacts that are more severe in the reconstruction by the £i 
analysis formulation. This issue can be resolved by replacing the non-overlapping patches by overlapping patches. Furthermore, 
sparse representation of overlapping patches allows more redundancy, which helps reduce the sparse approximation error. In 
this case, applying the inverse and the biorthogonal dual of the sparsifying transform are no longer patch-wise operations, 
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but the inverse operation might be still efficiently computed by solving a structured inverse problem. More generally, the 
sparsifying dictionary might be replaced by any redundant dictionary. 

However, we do not pursue the various possible the improvements of the reconstruction performance in this paper. As 
mentioned earlier, the purpose of the numerical results in this section is just to confirm that the modification made in the 
oblique pursuit algorithms from the original ones does not degrade their empirical performance. It turned out fortuitously that 
the oblique pursuit algorithms, designed to provide guarantees in terms of the RBOP, also show significant improvement in 
empirical performance. 



Previous guarantees for the reconstruction of sparse signals from compressive sensing via random frame matrices by various 
practical algorithms were provided in terms of the restricted isometry property (RIP) of the sensing matrix. Previous works 
on the RIP focused on scenarios where, to satisfy the isotropy property, the sensing matrix is constructed from i.i.d. samples 
from a tight frame according to the uniform distribution. However, the frame might not be tight due to the physics of the 
sensing procedure or due to the dictionary that provides a sparse representation. Furthermore, a non-uniform rather than the 
uniform distribution is often used for the i.i.d. sampling in practice in compressed sensing, especially in imaging applications, 
due to the signal characteristics or due to the limitation imposed by the physics of the applications. To derive guarantees 
without idealized assumptions, we proposed to exploit the property of biorthogonality that naturally arises in frame theory. We 
generalized the RIP to the restricted biorthogonality property (RBOP) that is satisfied without requiring the isotropy property. 
To take advantage of the new RBOP, we extended greedy pursuit algorithms with RIP-based guarantees to new variations - 
oblique pursuit algorithms, so that they provide RBOP-based guarantees. These guarantees apply with relaxed conditions on 
the sensing matrices and dictionaries, which are satisfied by practical CS imaging schemes. The extension of greedy pursuit 
algorithms and their RIP-based guarantees to those based on the RBOP is not restricted to the specific algorithms studied 
in this paper. For example. Fast Nesterov's Iterative Hard Thresholding (FNIHT) ||49) is another promising algorithm with a 
RIP-based guarantee, which will extend similarly. Finally, we note that although the oblique pursuit algorithms were designed 
to provide performance guarantees in the worst-case sense, they also perform competitively with or sometimes significantly 
better than their conventional counterparts empirically. 



V. Conclusion 



Appendix 



A. Preliminaries for the Appendix 



Definition A.l (Dilation ^^5\): The dilation of matrix M is defined by 







M 



.y{M) ^ 



M* 
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By definition, S^{M) is a Hermitian matrix and its eigenvalues satisfy 

r 

at{M) if i < n 

A,(^(M)) = I 

-o-n-i+i(M) if i > n. 

Definition A.2 (Schur Complement): Let Af e K"^" be a square matrix that can be decomposed as follows: 



M 



Mil Mi2 
M21 M22 



where AI22 e K'?^'' for q < n is a minor of M, which is also a square matrix. The Schur complement of the block M22 of 
the matrix M, denoted by M/M22, is the {n — q) x (rt — q) matrix defined by 

M/M22 = Mil - M12MI2M21. 

The following lemma extends ||56j Theorem 5] to the non-Hermitian case. 

Lemma A.3: Let M e K"^" be a nonsingular matrix and A/22 e K''^^ for g < n be a minor of M. Then, 

ai{M) ^ cri{M/M22) 



and 



aj{M/M22) > a,+g{M), Vi = 1, . . . , n - g. 



Remark A.4: The analogous result for the Hermitian case |56 Theorem 5] assumed that M is semidefinite and also showed 
that 

a,{M) > a,{M/M22), Vj = 1, . . . , n - g. 



Proof of Lemma A.3 By the Cauchy interlacing theorem, (Jq{M22) ^ crn{M) > 0; hence, A/22 is invertible. Let 



M 



Mil 


A/12 




Mil - Mi2M^^H42i 




A/i2A/2"2^A/21 A/12 








+ 


A/21 


A'/22 









A/21 M22 



= .1/1 =Af2 

Let A/22 = UT^V* be the singular value decomposition of A/22. Then, A/2 is factorized as 



A/2 



A/i2yS-l/2 
[/El/2 



S-l/2[/*A/21 Sl/2l/* 



where the left factor has q linearly independent columns and the right factor has q linearly independent rows. Therefore, 

rank(A/2) = q. 
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Now, we use Weyl's inequalities for the eigenvalues of the sum of two Hermitian matrices {57] Theorem III.2.1]. By applying 
||57l Theorem III.2.1] to ^(Mi) and ^(Mz), we obtain 



A,+,(^(Mi) + ^(M2)) 
^ Aj(^(Mi)) + A,+i(^(Af2)) 
= A,(^(Mi)), Vi = 

where we used the fact that Ag+i(c^(M2)) = (Tg+i(Af2) = since rank(Af2) = q. Therefore, 

(Tj+q(Mi + M2) (Tj(Mi), Vj = 1, . . . , n - g. 

Since M is invertible, A'f/A/22 is also invertible since <Jn^q{M /M22) > cr„(Af) > 0. The Schur complement (Af/Af22)~^ 
is a minor of Af~^; hence, 

ai(Af)-i = a„(Af-i) < a„_,((A//Af22)-^) = ai(M/Ar22)"'. 



Lemma A.5: Let Af e 



^. Then, 



ai{M - Ira) = max (1 - a„,{M),ai{M) - 1) 



Proof of Lemma A.5 If Af is a Hermitian matrix, then the proof is straightforward since the eigenvalues of M — I„i are 
the eigenvalues of M shifted by 1. Otherwise, by 157] Theorem III.2.8], it follows that 



max \\k{y{M))-Xk{y{Im))\ 

ke [2m] 

\\.yiM)-.y{ira)\\ 

s; max |Afe(^(Af)) - A 

2m— 

fee [2m] 

where y{M) and y{Im) are the dilations of M and /,„, respectively. 
Since 

crfc(A/) k s^m 



\k{y{M)) = { 



-a.m-k+i{M) k> m 



and 



Afc(^(/m)) = < 



1 k ^ m 

— 1 k > m. 



it follows that 



max \\k{y{M))-Xk{y{Im))\ 

k£ [2m] 
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max \\k{y{M)) - \2m-k+i{-y{In.))\ 

fee [2m] 

max(l-a™(M),(Ti(M)-l); 



hence, 



\\M-I^\\ = ||^(A/-/„)|| 

= ||^(Af)-^(/,„)|| 

= max(l - cr„j(M),cri(M) - 1) 



Lemma A.6: Let M, M e K™''''. Let Ji c [fc] and J2 = [A:]\Ji. Suppose M*Af has full rank. Then, 



MU1\J^\ - Mj,{MXMj,)^MX)Mj, - /i^.ill ^ \\M*M - h\\. 



Proof of Lemma A.6 To simpHfy the notation, let E = Mj^^Mj^Mj^Hd*^. By Lemma A.5 it follows that 



max 1 - a, , (Ml EMj, ) , ai (M^^ EMj, ) - 1 



Furthermore, since M*M has full rank, (??) is upper bounded by Lemma A. 3 as 



maxjl - ak{M*M), ai{M*M) - 1 
= ||M*M-/fe|| 



(A.l) 



(A.2) 



where the last step too follows from Lemma |A.5| ■ 
Lemma A.7 ( [58^ Corollary 5.2]): Suppose that E e K"^" is idempotent (E^ = E) and is neither nor /„. Then, — 

E\\ = \\E\\. 

Lemma A.8: Let $ e K™^". Let P e K"^" be an orthogonal projector in K". Then, for all x,y e K", 



<*Pa;, *P?/>| - \(Px, Py}\ ^ ||P**«'F - P\\ ■ \\x\\2 ■ Wyh- 



(A.3) 



Proof of Lemma A.8 The proof follows from the properties of an inner product: 



(b) 



(^Px, ^-Py)! - \(Px, Py) 
{^Px,^iPy)-{Px,Py) 
{x.P^!*mPy)-{x,Py) 
{x,{P^*^P - P)y) 
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where (a) follows from the triangle inequality, (b) follows since = P and P* = P. 



B. Proof of Theorem 2.2 



ObThres is guaranteed to recover J* if 



min > max (A.4) 



The jth component of ^*?/ is given as 



and satisfies 



where the third step follows since 



< |e*n{,}$*vl,n,,.x* - (a;*), I + 

= |e*(n{,|$*vi/nj. - Tiy}r,j')x*\ + \rjA 

<0,+i($**)l|a;*l|2 + max||V;,||2||z||2 (A.5) 
3 



Iin{,}**ra7.-n{,}.j.|| 



Then, (??) is obtained by applying (??) to (??). 



C. Proof of Proposition 2. 7 



Given J c. J* , the next step of ObMP given J finds an element from J*\J if 

Let E denote E-j^^^ j-^i- tz(^ j) simplify the notation. Then, P* = -E^^^^^x -n(^ j) ™ oblique projection. 

To derive a sufficient condition for (??), we first derive a lower bound of the left-hand side of (??) in the following: 

max \^*Ey\'Si max {ip* E'^'IlJi■x*\ - Ez\ 

> max \i^]E^Uj.x*\-\\^*\\2,^jE\\ \\zh. (A.7) 

^ V ' 

(*) 
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The term (*) in (??) is bounded from below by 



(*) = max 'ih* E^IiU j,\ jx* 



max 

max 

jeJ*\J 



(a 



^ max \(Uj.\jej, Uj.\jx*}\ 

jEj*\J 

: max |(x*),|-0.+i($*vI;)l|nj.^^x*||2 

J€J*\J 

^ \\Uj,\jx*y-9,+i{^*n\^j,\jx*\\2 



(A.8) 



where (a) holds by Lemma |A.8| since it follows, by Lemma |A.6| that 



\\iij.\j^*E^nj,\j-iij.\j\\ 

= m,\jE^j.\j-i\j.\j\\\ 

^ ||$**j-/,H0,+i($*vi/). 



Next, we derive an upper bound on the right-hand side of (??) in a similar way; 



max \^l}*Ey\!^ max l-ip* E'^/Uji.x*] + I'll}* Ez\ 

Mn]\J* Mn]\J* 

max \$*E^nj.x*\+\\^*\\2.^\\E\\ \\z\\2. 

(**) 



(A.9) 



The term (**) in (??) is upper bounded by 



(**) = max ^P*E^U^j,^^^})\jx* 

je[n\\J* 



max 

i€M\J* 

max 

je[n]\J* 



(b) 

s; max |<n(j.^{j-})\jej, n(j.^{j))\ja;*> 

]e[n]\J* 

= max +0,+i($*«')l|n(j*^{j})\ja;*||2 

je[n]\J* 



0s + li^*^mj.\jX*h 



(A. 10) 
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where (b) follows by Lemma |A.8| since it follows by Lemma |A.6| that 

l|n(j*u{j})\,/**£^*n(j.^{j-})\j - n(j.^{j-))\j|| 

ll**J*u{j})\J^*(./*u{i})\J -^|(.7*u{j})\J|ll 

Applying the bounds in (??) (??) to (??), we conclude that, for the success of the next step, it suffices to satisfy 

\\Ilj.\jX*U - 20.+i($**)||nj.\jx*||2 > 2||$*||2,3o||i?|| ll^b- 

Then, computing an upper bound on will complete the proof. 

When 'i> = E reduces to an orthogonal projection and satisfies 1. However, since we propose to use ^ E 

is an oblique projection and is not necessarily bounded by I. 



Since E is idempotent and E is neither or /„, by Lemma A. 7 it follows that 



\\E\\ = ||/„ - E\\ = 

^ II II 11$ J II ^ ll*./«llll$J-| 



D. Proof of Theorem \2.9\ 

The proof for the ObSP case is done by the following four steps. To simplify the notations, let 

= 03,($*^), 5 = <5,(^), and ~5 = 62s{^)- 

For J = {ji, . . . , c [n], define Rj : K" ^ by 

{Rjx)k = Xj^, Vfc 6 J, Mxe K", 

which is the reduction map to the subvector indexed by J. The adjoint operator i?* : ^ K" satisfies 

R*jy = 1] iy)kej 



fc=i 



where is the fcth column of /„. 



Lemma A.9 (Step 1): Under the assumptions of Theorem 2.9 

\\xt+i - a;*||2 < Pi||nj.\j^_^ja;*||2 + ti||z||2 

where pi and ti are given by 

1 , vm 

Pi = I and Ti = — . 
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Lemma A.IO (Step 2): Under the assumptions of Theorem 2.9 

l|nj.\J,_,ia;*||2 < P2\\^j,\j^^^X*\\2 + T2\\z\\2 

where p2 and T2 are given by 

1 + 6* ^ 2\/l+~5 
P2 = z ^ and T2 = — —. 



Lemma A.ll (Step 3): Under the assumptions of Theorem 2.9 



where and T3 are given by 



and 



II nj.\J,+i II 2 P3||nj.\j^a;*||2 + r3||z||2 
9 26(1 - 0) 



P3 = max 



1-6*' 1 + 261 + 26*2 



T3 = max 



1 2(1-61) \ 2v'm(l + 5) 

l~e' 1 + 26* + 26*2 j 1-6* 



(Step 4): Finally, because supp(xt) = Jt, 

l|nj.\j,a:*||2 = \\nj,\j^{xt - x*)\\2 ||a;i - x^y. 

Then, p and r are given as 

P = PiP2P3 and T = Ti + P1T2 + P1P2T3. 

If we let ^ = ^f, ObSP reduces to SP, and the RBOP-based guarantee for ObSP also reduces to the RIP-based guarantee 
of SP. However, compared to the original guarantee |12|, the guarantee of SP obtained from Theorem |2.9| requires a less 
demanding RIP condition. 

The results for the other algorithms (ObCoSaMP, ObHTP, and OblHT) are obtained from the corresponding results for the 



conventional algorithms (CoSaMP, HTP, and IHT) |14|, ||5]J. We only need to replace vl/*v[/ by in the algorithms and 

replace Sksi'i') by 6ks{^*'^) in the guarantees. 

Constants p and r are explicitly given as follows: 

. ObCoSaMP 



P ■ 



/46'4s(***)2(l + 36'4s(***) 
l-04s(***)2 



2^ 



l-6l4.(***)2 l-6l4.(**'J') 
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ObSP 



ObHTP 



p = , max { ^ , 

T = ^ ^ h 



1 - 03s(***) (1 _ 03,($*VI/)). /l - 03,($*VI/)2 



+ 



2Vl + '^s(*)(l + '52.(*))V1 + 03.(***) 
l-03.($**)-(l-e3.($*^)) 

I 1 2 



(1 - 03s(**^))2 1 + 203.(«'**) + 203.(**^')^ 



OblHT 



' 2g3.(^*^)' 
l-03s(***)2^ 

/ 2 



+ 



1 



l-03.(**^')2 l-03s(***) 



l + <52s(*). 



p = 2^3, and T = 2Vl + <52s(*) 



Lemma A. 10 is of independent interest to provide the finite convergence in Theorem 2.12 We stated Lemma A. 10 as 
Lemma 2.10 in Section [ll] For ObCoSaMP and ObHTP, similar lemmata are obtained with a slight modification from the 
corresponding results p4| , pT) . Constants p and f in Lemma 2.10 are explicitly given as follows: 

. ObCoSaMP 



'1 + 36l4^(***)2 
l-6l4,($*^')2 

^ + 36l4.(**«')2 
l-04s($**) 



+ V3 Vl + '54s($). 



ObSP 



l-03.(***)2 



l + '54sW 
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ObHTP 



1 _ ji + s^^m 



l-(?3s(***) 



*^,]2 l-03.(***) 



Proof of Lemma A.9' Lemma A. 9 is an extension of the analogous result by Foucart fSTl to the biorthogonal case. The 
modification is done by replacing some matrices and introducing the RBOP instead of the RIP. We repeat the proof with 
appropriate modifications as a guiding example that shows how to modify the derivations using the RBOP. 
Recall that xt+i is given as 

xt+i = argmm{l|$J^^^(?/ - -^x)! : supp (x) c Jt+i}- 
Therefore, by the optimality condition of the least square problem, it follows that 

(n.,*)*n,,(2/-*2:t+i) = o, 

but, by the RBOP, ^ has full row rank; hence, 

= ^%^^{-^{x* - xt+i) + z) = o, 



which implies 



Now, 



n.7,^,**^'(xt+i - X*) = iij^^,^*z. (A.ii) 



\\llj^^^{xt+i -x*)\\l 
= <xt+i-a;*, nj^^^{xt+i - X*)) 
= (xt+i - x\ (/„ - **$)n,7,^,(a;t+i - X*)} 
+ (xt+i-x*, ^'*$nj^_|_j(a;t+i - x*)> 

= (IIj.u Jt + 1 (2^4+1 - X*), {In - ^'*$)njj^j(xt + i - X*)) 

+ (Uj^^^^^'^ixt+i - X*), Uj^^^{xt+i - X*)} 

= (Xt + l - X*, Ilj*^j^_^^{In - **$)njj^^(xt+i - X*)} 

+ <njt+i**2, ^Jt+ii^t+i - X*)} 

(b) 

^ 6'||Xf + i - X*\\2\\llj^^-^{xt + l - X*)\\2 



+ Y 1 + (5||z||2||nj,+i(a;t+i - a;*)||2 (A.12) 
where (a) follows from (??), and (b) holds since 

l|nj.^,7,,,(/„-vi/*$)nj,^j| 
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l|nj 



where q = | J* u Jt+i\ 2s. 
It follows from (??) that 



l|nj,^i(a;t+i - x*)\\2 «; 6'i|a;t+i - a;*l|2 + VI + '^H-^^ia- 



Therefore, (??) implies 



\\xt+i - x*\\l 



^ 9\\xt+i -x*\\2 + \/l + 5\\z\\2] + 



hence, we have 



||a;t+i - x*\\2 



eViTsWzh + ^{i-9^)\\nj,\j^^^x*g + {i + 5)\\zg 



1-6*2 



1 liTT *ii , yi+^ ii 



Proof of Lemma A.IO Recall that Jt+i is chosen as the subset of Jt+i corresponding to the s largest elements of 



)-ivI/* ?/; hence, it satisfies 

^ Jt+i Jt+i' Jt+i*'' 



which implies 



llnr^Rt )-^^% y\\ 

" '''+1 Jt+1 -'t+l' Jt+1*'" 



lin^ R% ($1 yll 

II Jt+l\Jt+l Jt+1 Jt+1^ Jt+1 ^11 

II Jt+l\J* Jt+1^ Jt+1 Jt+l' Jt+l*^ll 



(A.13) 



The left-hand side of (??) is the norm of the following term: 



Jt+i\Jt+i Jt+1 ^ Jt+1 Jt+i' Jt+1 

Jt+i\Jt+i Jt+i^ Jf+i Jt+i^ Jf+i^ ' 



n 



Jt+i^ Jf+i Jt+i^ 



— 1 ,Ti * 

Jt + 1 



(A. 14) 



The first summand in (??) is rewritten as 



Jt+i\Jt+i Jt+i ^ Jt+i Jt+i' Jt+i Jt+i 

Jt+i\Jt+i Jt+i^ Jt+i Jt+i' 7t+i Jt+i Jt+i 

Jt+i\Jt+i jt+i Jt+i 

= Ilj X*. 

By the RBOP, the other summands in (??) are bounded from above in the £2 norm by 
and by 

lin^ i?t ($t *7 ZII2 ^ ^^^^||z||2. 

II Jt+i\Jt+i Jt+i^ Jt+i •it+i' Jt+i II 1 — ^ 

Combining ???????? impUes that the left-hand side of (??) is lower bounded by 



IITT *ll ^ liTT *ll Vl^ll ,1 

I' Jt+i\Jt+i 1'^ 1 — J X^t+i I' 1 — 

The right-hand side of (??) is the norm of the following term: 

Jt+i\j* Jt+i Jt+i'' Jt+i 
.mj^^^x* + ^nj.^j^^^x* + z). 

Similarly to (??), the first sunmiand in (??) is rewritten as 

In a similar way, the other summands in (??) are bounded from above in the £2 norm by 

liny .j^R% (**~ *n,.,y x*\\2 ^ -^\\u,,.j x* 



and 



II Jt+iX-/* Jt+i^ J^t+i -Jt+i' Jt+i II 1-0 



Combining ???????? implies that the right-hand side of (??) is upper bounded by 



Therefore, by (??) and (??), we have 



Note that J*\Jt+i = {J*\Jt+i) u (J* n {Jt+i\Jt+i)) and J*\Jt+i and J* n (Jt+i\Jf+i) are disjoint. Therefore, since x* 
is supported on J*, it follows that 

||n,.v/.,.x*||^ = ||n,.^^^^^a:*||^ + ||nj^^^^,^^^x*|ir (a.25) 

Applying (??) to (??), we obtain 



26* „^ *„ 2VT+1, 



which implies the desired inequality after simplification using \/a^ + b'^ a + b for a,b ^ 0. ■ 
Proof of Lemma A.ll The last step in each iteration of ObSP updates xt by xt = -Rj^($*^5'.7j^^$j^y. Since 5* 



and * satisfy the RBOP, by Lemma 2.4 ^./^ (** vj^jj^^**^ is a valid obUque projector onto 7?.(*jJ along 

Then, /„ - ^fj^ ($* ^fjj-i^*^ and $jj are also oblique projectors. Let E denote the oblique projection 

In - *Jt(^jt*Jt)"^*jt to simpHfy the notation. Then, 

^*{y--^Xt) = ^*Ey. 

Let 

J=supp (Hj^'^Ey 



Since E^ijjj = for all j e Jt, it follows that J is disjoint from Jt. 
By definition of J, we have 

||$*-Ey||2 ^ m.Eyh. 

hence, it follows that 

m\j.Ey\\2 ^ \\^*.\jEy\\2. (A.26) 
Since Ebj = for all j e Jt, the left-hand side of (??) is the norm of the following term: 

^yj.Ey = ^^j,Ei-^Ilj.\j^x' + z). (A.27) 
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The first summand in (??) is upper bounded by 



(*) 



where (*) is upper bounded by 



= ||nj\,;.($*i?*-/„)n,.\jj 



lin 



(JuJ*)\J, 



uJ*)\Jtl 



i**juJ*)\Jt^*(./^J*)\./t - ^|(,JuJ*)\Jt| 



*JuJ. -^iJu, 7* I II < 



Therefore, 



E^Uj.\j^x*h^e\\Ilj,\j^x*\\2. 



The first summand in (??) is upper bounded by 



(a) 



^ Vl + '^ll-^n-^llll^b 

l+~Sl^j,i^l^jJ-'^l\\\\z\\2 



where (a) follows from Lemma \A.7\ 

The right-hand side of (??) is the norm of the following term; 



(A.28) 



(A.29) 



(A.30) 



where the first equality holds since E*ipj = for all j e Jj and the last equality holds since J*\Jt = {J*\Jt+i) u (J* n J), 
and J*\Jt+i and J* n J are disjoint. 

The first term in (??) is lower bounded by 



^ a, 



E^ , 



lin, 



>^i./*\./|(*j*\j*./-\./)l|nj.\j,,,^1l2 



The second term in (??) is lower bounded by 



(**) 



where (**) is further upper bounded by 



n 



(J*\Jt+i)u(J*n,/) 



(J*\Jt+i)u(J*nJ)ll 



**j*Vt+i)u(7*nJ)-^*(^*\Jt+l)u(J*nJ) ^\{J*\Jt+i)u{J*r,J)\ 
ll**j*\Jt+i)u(J*n J)-^*(7*Vt + l)u(J*nJ) ~ ^\{J*\Jt+i)^{J*r,J)\ 



Therefore, 

The last term in (??) is upper bounded by 



i^j*\jt-^^ll2 < ^ — ^ Fib- 



1 - 



Applying ?????????????? to ??, we obtain 



Since 



(??) impUes 



-2 2- 



1lnj.\j,a;*||2 + ^^f^ ^||z||2 



>{!- e)\\Uj^^j^^^x*h - 0^\\Ilj.\j^x*\\l-\\Uj.^j^^^x*\\l 
The final result is obtained by simplifying (??). 
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To simplify the notation, let 



b = l|nj*\j,a;*||2 



2vm(i + 5)„ „ 

c= ll^b. 



Then, (??) reduces to 



9b + c^il-9)a-e^/b^-a^ 



which is equivalent to 



^V^^ - > (1 - 0)a - {Ob + c). 



If (1 - e)a ^eb + c, then 



Otherwise, if (1 — 6)a > 9b + c, we have 



(A.36) 



32(62 _a2) ^ _ - 9b - cY 



which implies 



(26|2 + 29 + l)a^ - 2(1 - 9){9b + c)a + {9b + cf - 9^b^ s: 0. 



Therefore, 



^ 20(1-^) 2(1-0) 
" 26*2 + 261 + 1 26*2 + 20 + 1''" 



(A.37) 



Combining ???? completes the proof. 



E. Proof of Theorem \3.1\ 

Let X be a random variable defined as 



X = max 

J|=s 



Let ^fc and be the transposed kth row of ^/m^ and y^^^ respectively, for all k e [n]. By the assumption, {£^kYk=i and 
(Cfc)fc"=i are sequences of independent random vectors such that 



E^k^l = E^-** and E(kQ = El'*^' 



for all k 6 [to]. Then, X is rewritten as 



X = max 

|J|=S 
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Like the RIP analysis for the Hermitian case p8), pO), the first step is to show 



EX rSi — . 

9 



By symmetrization |20 Lemma 6.7], EX is bounded from above by 

2 



EX 2E max 

\J\=S 



-E max 

m \J\=s 



where (efc)/ILi ^ Rademacher sequence independent of {£,k)^=i and (Cfc)^i- 
Define random variables Xi and X2 by 



Then, Xi and X2 are rewritten as 



Xi = max 

\J\ = S 



X2 — max 

\J\ = S 



nj(4'*^'-E***) Hj 



Hj I -E***i n 



Xi = max 

\.J\=S 



X2 = max 
|j|=s 



(A.38) 



By symmetrization, EXi and EX2 are bounded from above by 



EX I — Emax 

TO |,7|=s 



EX2 sS —Emax 

TO |J| = s 



Lemma A.12 ( p8\ Lemma 3.8]): Let (/ifc)^!^^ be vectors in K". Let Kh = max^ ||ft./c||co- Then, 

Iljlj]ekhkht]llj 



Eg max 

\J\ = S 



^ Cs-v/slnsVlnrtVlnTOX/i max 

lJl=s 



1/2 



Since max^ ||Cfc||oo ^ K, by Lemma A.12 it follows that 



If 



EXi 2C3. /—In fiVrn^VrnTOi^VE^i + 1 + (E'i'*^'). 

V TO 



2C3A /— lnsVlimVln^ii:v'2 + 6's(E1'*«') Ji 
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for some Si < 1, then EXi < Si and it follows that 



2C3 A / — In s Vln n Vln mKE max 



n.(i|«)n. 

2C3A / — In sVln^Vln^isrv'lE^i + 1 + 6's(E***) 

V TO 

2C3./ — In sVln^Vln^ii: V-^i + 1 + 6'^(E***) 



1/2 



2C3^ —\nsV\^V\^K^/2+l^K^'*^ sS Si. 



(A.39) 



Since maxj, ||Cfc||oo ^ K, by Lemma A. 12 it follows that 



Similarly, if 



EX2 2C3J— InsVln^Vln^WEXa + 1 + 6'JE$*$). 
V m 

2C'3y^lns\/ln^VWi^y'2 + ^^(E^*^) S2 



for some (^2 < 1, then EX2 ^ S2; hence, 



2C3a/— InsVln nVl 



E max 

|J|=S 



n.(l|a4.)n. 



1/2 



S2 



(A.40) 



UnUke the conventional RIP analyses |18| , |20| , matrices {(^k£,*)^=i Hermitian symmetric. The following lemma is 

modified from Lemma |A.12 to get a bound on EX for the non-Hermitian case. 

Lemma A.13: Let {hkY^^i 0^kYk=i vectors in K". Let = max^ ||/ifc||oo and isT^ — max/j l|/ife||oo- Then, 



Eg max 

\J\ = S 



^ C's-v/s In sVln nVln m 



if/i max 

\J\ = S 



1/2 



+ max 

\J\=s 



n, 2 ^fe/^* n. 



1/2- 



(A.41) 



Proof of Lemma A.13 By a comparison principle |59 inequality (4.8)], the left-hand side of (??), denoted by Ei, is 
bounded from above by 



El — E„ max 



\J\ = S 



2 



E„ max max 

I J| = s x,yeB^ 



k = l 



where {gk)^^i is the standard i.i.d. Gaussian sequence and B2 — {a; e K" : ||x||2 1, supp (x) c J}. 
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Define a Gaussian process Gx,y indexed by {x, y) e K" x K" as 

m 

Gx,v = 2 9ky*hkKx. 

k=l 

By Dudley's inequality, i^i is bounded from above by 

/ \ 1/2 

i;i<C4 (\nN{[j{BixBi),d,u)] du 
•^0 \J\=s 

where N{B, d, u) is the covering number of set B with respect to the metric d, induced from the Gaussian process Gx,y by 

d{{x,y),{x',y')) = {E\Gx,y-Gx,,y,\Y^. 



Let 



and 



Ml = max 

\J\ = S 



M2 — max 

\J\ = S 



\k=l 



<k=l 



1/2 



1/2 



Define 



= max \hlx\ and ||a;||^ = max \hlx\ 

ke[n\ ke[n\ 



for x e K". Then, \\ -Wh and || • ||^ are valid norms on K" induced by {hk)™=i and (/ife)fcLi. respectively. 
Let x,x',y',y' be arbitrary s-sparse vectors in K". Then, d{{x,y), {x',y)) is upper bounded by 

d((a;,y),(ar',y))2=E|G,,^-G,.,j2 

m 

= E| 2 gky*hkhl{x - x')\ 

k=l 

m 

^ max |/;*(a; - x')\'^E\ V gky*hkf 

fc6[„] 

m 

= max|/i*(a;-a;')r2l2'*'^'^l' 

= \\x-x'\l\hl{x-x')\^Y^y''hkhly 



k=l 



^Ml\\x-x'\l 



(A.42) 



where the fourth step follows since (5fe)feLi is the standard i.i.d. Gaussian sequence. 
Similarly, d{{x',y), {x',y')) is upper bounded by 

d{{x',y),{x',y'))^M,\\y-y\. 



(A.43) 
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Then, by the triangle inequality and ????, it follows that 



Hence, 



d{{x, y), {x\ y')) ^ d{{x, y), {x',y)) + d{{x\ y), {x\ y')) 
^ M2\\x ~ x'Wh + Mi\\y - y'Wi; 



(lniV( U {Bi xBi),d,u)y^^ 

\J\=S 

(ln7V( U Bi,M2\\ ■ \\h,u/2 

\.I\=S 

+ (ln7V( U B'i,Ah\\-t^,u/2)) 

\J\=S 

^2M2v^(lniV( (J l=BiA\-U,,u) 

\J\=s ^ 

+ 2Mi(lniV( y j=Bi,\\-\\~„u)) 



1/2 



1/2 



The remaining steps are identical to the Hermitian case ( f2Xj\ Lemma 8.2], p8| Lemma 3.8]) and we do not reproduce the 
details. We obtain the desired bound by noting 

1/2 



^ C'iKh In sVlnnVlnm 



and 



2,11 • Wh^U^ 



1/2 



^ C^Kj^ In sVlnnVlnm, 



which have been shown in the proof of fTF, Lemma 3.8]. 
Let 

^1 = 



and 



^J2 + 6I^(E$*$) 



Applying Lemma A. 13 to (??), we obtain the following bound on EX. If 



2C3 A / — In sVlnnVlnm 
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(8/9)^ 



then 



EX ^ 2E max 

J|=s 



^ 2C3\ — InsVlnnVlir) 



•E 



_ftr max 

.7|=s 



+ K max 

.71 = s 



n,,(imn. 

n.(i|ac;)n. 



1/2 



1/2- 



(^1 + ^2 



where the last inequality follows from (??) and (??). 

The second step is to show that X is concentrated around EX with high probability. The corresponding result for the 
Hermitian case j20| Section 8.6] has been derived using a probabilistic upper bound on a random variable defined as the 



supremum of an empirical process |20 Theorem 6.25]. We show that the derivation for the Hermitian case |20 Section 8.6] 
extends to the non-Hermitian case with slight modifications. 

Let B2 — {x e K" : ||a;l|2 < 1, supp {x) c J}. Since B2 is closed under the multiplication with any scalar of unit modulus, 
mX is written as 



Define /, 



x,y 



mX = max max 



by 



fc=i 



max max Re ^ y*(CfeCfc - 

\J\ = sx,y<,Bi Vi^l 



fx,y{Z)=Re{v*{Z-^Z)x). 



Then, ^fx.yiCk^X) ^ ^ ^ ^ [^^] rewritten as 

mX = max { 2 : v) 6 IJ (^2 x 62') 

"^'^ fe=l |J| = S 

Let fc e [m] be fixed. Let x, y e 62 ■ Then, 



< ||nj(a4*-ECfee^)n7||;^!^,„ 
•||nj(a4*-Ea4*)n7||,?_^ 
= ||nj(a4*-Ea4*)n7 



CO ''GO 

1/2 



||nj(aa*-Eac*)n„ 



II 1/2 



(A.44) 
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where the third inequahty follows from Schur's interpolation theorem | |60[ . 
We derive an upper bound on |nj(Cfc^^ - KCk^*)Uj\\ by 



||nj(a^fe*-Ea4*)n 



||nja4*n,,||^„^^„ +E|nXfe4*nj||,„_,„ 

s; 2sKK (A.45) 



where the second inequality follows from Jensen's inequality, and the last step holds since 
Similarly, we have 



By applying ???? to (??), we obtain 



||nj(efeC* - EaCfe)njlU_,„ < 2sKK. (A.46) 



\f.,viCkek)\^2sKK. (A.47) 



Since k was arbitrary, (??) implies that fx.y{Ck£,k) is uniformly bounded for all {x,y) e [j^j^^^ {B2 x B2) and for all 
k e [to]. 

We also verify that the second moment of fx,y{Ck£,*) is uniformly bounded by 

E|/.,,(a4*)i' 

= E|2;*nj(ae;?-Ea4*)n7x|' 

^E||nj(a.4*-Ea4*)njx||2 

= Ep (||njail^njac*n,, - n,,(EaCfe)nja4*nj 

- Uj(kCMEaek)'nj + n,7(Ea-a*)nj(Eae*)nj)x] 
= X* (||nXfcl|^nj(Ea4*)n,/ - iij{E^kCmA^Ckek)'nj)x 
< l|n;ail^l|n7(Eae*)n,7ll + \\Uj{E^kCmjf 

sK^{l + 9,(E^*-9)) + 1 + 0,(E***). 



Then, by |20 Theorem 6.25], we obtain (??). 



¥{X 5= (5) sS P toX 5= toEX + 



m5 



I 



exp 



V9 ' 2sKK J 



jf^(l + 0,(Evl/*vl.) + i±Mp!il) 



I 32(5 _ m I 2(5 m 



9 2sKK 27 2sKK 
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exp 



mS 



6sK ( 27X ( 1 + 0,(E***) + '+';|r'^^ ) + 98(5if 



^ exp ^ ^ . (A.48) 

\^ l236sKmax{K,K) J 

Therefore, P(6's($**) ^ 6) t] holds provided that m satisfies 

• s(lns)^ Inn In TO 

and 

m 5= C2(5"^Xniax(/C, X)sln(7r^) 

for universal constants Ci and C2. 
Proof of Theorem \3.4\ 

Since 4* = AD, by the construction of A from in (??), it follows that maxfe.£ ^ sup^ maxj |(0^,c?j)|; 



hence, the incoherence property of ^ is satisfied by the assumption. To invoke Corollary 3.2 it remains to show 6's(E4'*4') < 
Kq. By the definition of Og, 6*5 (E^I'*'!') is rewritten as 



max \\D'^j¥.A*AD.,\\ - 1, 1 - min A„(£'5E^M£ij) 

I J| — S \J\=S 



(E***) = max 

Let J be an arbitrary subset of [n] with s elements. Then, it follows that 



(A.49) 



\\D*j¥.A*ADj\\ < ||D}||||EA*A||||Dj|| 
= IIEAMIIpjp 

<t'max||***l|[l + <5sP)] 

<i'max[l+0d($$*)][l + <5sP)] (A.50) 



and 



A„(D*EAMi?j) > (7„(i?*)A„(Ev4M)a„(Z?j) 
5= A„(EAM)A„(i?*;Z?j) 

>i'minA„($$*)[l-<5.p)] 

> i'min[l - e„($$*)][l - <^.P)]. (A.51) 

Applying (??) and (??) to (??), we verify that given in (??) is a valid upper bound on (^^(E^'*'!'). This completes the 
proof. 
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G. Proof of Theorem \3.5\ 

First, we note that the mutual incoherence between and (cJj)j(=[„] is written as an operator norm given by 

supmax|<(/)^,dj>| = [|$*i:'[|f5^^L„(o,p). 
Similarly, the mutual incoherence between {(f>S)u]€n ™d {dj)j^Yn\ is written as 

wen jeN 

where A^;^ : L2{i^,fJ.) L2(^^,m) is a diagonal operator defined by 

(A;i/i)(w) = ^H/iH, V/i e L2in, n). 
Then, || A^^^*!)!^^^^^ (q^^, is upper bounded using K as follows: 

||A;i$*5j|,„_i^,(^,^) 

= ||A;i$*($$*)-ii?(i?*i?)^i||,.^i.^,(o,,o 

+ II A;1$* _ /Ji?(I?*i?)-i||,.^i„(a,^) 

1 



^min 



•||D||,„^,.||(i^*i5)-i||,.^, 



^min 



if + ( supll^^ll^c 

'en 



• max lldi-Ld 



(A.52) 



Let K be the right hand side of (??). Then, we apply the incoherence parameters K and K to Corollary l3Jj Since E^'*^' = /„, 



we have 6's(EvI/*vl/) = 0. Therefore, to obtain a condition on m, it only remains to bound 6'.,(EvI'*vl/) and 0s(EvI'*vl/). 



In the proof of Theorem 3.4 we derived an upper bound on 6s(E'^*'9) given by 



6',(E***) max(l - ly^in, J^max - 1) 

+ J^m.MD) + + (5„(i?)(?d(***)]. (A.53) 

This upper bound is tight in the sense that equality is achieved if 0^ = ^n{D) = 0, which holds, for example, for 
Fourier compressed sensing with signal sparsity over an orthonormal basis D. 
Similarly, we derive an upper bound on (^^(E^*^'). Recall that D is written as 

b = D{D''Dy^ = D{D''D)-^l'^{D''Dy^l'^. 



Therefore, it follows that 



Ii5|| < [K{D*D)]-y^ ^ ^ 



and 

Similarly, since $ is written as 

$ = = ($$*)-V2($$*^-l/2^^ 

it follows that 

1 



ll^ll < [Ad($$*)] 



*x-|-l/2 



and 

Using ????????, we derive upper and lower bounds on the eigenvalues of E^*^' as follows: 

||E***|| ^ \\b*¥.A*Ab\\ 

< I|d*||||eI*I||||5|| 

= ||Ey4M||||D||2 

<^min[l-^<i(^^*)]-'[l-'^n(i))]-^ 

and 

A„(E$*$) ^ \n{b*¥.A*Ab) 

^ an{b*)\a{^A*A)an{b) 
^ Arf(El*i)[l + 5„(£))]-i 

Then, we derive an upper bound on 0s(E\l/*\I') using ???? as follows: 

6's(E**$) < 6i„(E$*$) 

= max[l - A„(E$*$), ||E$*$1| - 1] 
^ max {l - + 5n{D)\-'[l + 
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sS max(l - z/„,^^, v^.^^^ - 1) 



+ < 

mm 



+ 



+ 



1 - SniD) 1 - 

5„(I?)0d($$*) ] 



[l-(5„p)][l-0<j ($$*)] 



Combining (??) and (??), we obtain 



niax(6ls (E** , 6*^ (E$*^)) 
max(l - i^^^^l^, - 1) 



+ max(t/,nax,l'min)< 1 + 



6n{D) 

1 - 



Applying this to Corollary |3.3| completes the proof. 



(A.60) 



H. Proof of Theorem \3.7\ 



The proof of Theorem 3.7 



is almost identical to that of Theorem 3.5 



The mutual incoherence between {(l)u:)uien {dj)j^^„ 



is bounded in terms of K by 



sup max \(<j)i^ , dj}\ 

wen 

^min 

i n 



\\D\\i'^-,t^ 

ir+(^sup||^.||,.j^_^^^^^^^.(^max||d,| 



Let K be the right hand side of (??). Then, we apply the incoherence parameters K and K to Corollary 
bound 6l,(E***) and 6ls(E**$). 



3.3 



(A.61) 



It remains to 



In the proof of Theorem 3.4 we derived an upper bound on 6's(E^'*'I') given by 



,(E«'**) max(l - l/„iin, i/max " 1) 



(A.62) 
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Similarly to the proof of Theorem 1.10 6*5 (E^f**) is bounded by 



Then, (??) and (??) imply 



,(E^*^) < max{l - - 6s{D)][l + O^i^^*)]-', 



< max(l - v^l^, lyj^ - 1) + i^Jn 



(A.63) 



max(6'^(E^'**), 6's(E**«')) 

max(l - i/,7,^^, ly^^l^ - 1) + max(i/,„ax, I'min) 



l-6id($$*) l-6'd($$*) 



Applying (??) to Corollary 3.3 completes the proof. 



(A.64) 



Acknowledgements 

The authors thank Saiprasad Ravishankar for providing a sparsifying transform learned using his algorithm p2) , which was 
used in the simulations of this paper. 



References 

[1] S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way. Waltham, MA: Academic Press, 2008. 

[2] Y. Bresler, M. Gastpar, and R. Venkataramani, "Image compression on-the-fly by universal sampling in Fourier imaging systems," in Proc. DECI, Santa 

Fe, NM, Feb. 1999, pp. 48-48. 
[3] D. Donoho. "Compressed sensing," IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289-1306, Apr. 2006. 

[4] E. Candes, J. Romberg, and T. Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information," IEEE 

Trans. Inf Theory, vol. 52, no. 2, pp. 489-509, Feb. 2006. 
[5] E. Candes and J. Romberg, "Sparsity and incoherence in compressive sampling," Inverse Problems, vol. 23, p. 969, 2007. 

[6] D. L. Donoho and M. Elad, "Optimally sparse representation in general (nonorthogonal) dictionaries via t\ minimization," Proc. Nat. Aca. Sci., vol. 
100, no. 5, pp. 2197-2202, 2003. 

[7] R Feng, "Universal minimum-rate sampling and spectrum-blind reconstruction for multiband signals," Ph.D. dissertation, University of Illinois at Urbana- 
Champaign, December 1997. 

[8] R. Venkataramani and Y. Bresler. "Further results on spectrum blind sampling of 2D signals," in Proc. ICIP, vol. 2, Chicago. IL, Oct. 1998, pp. 752-756. 
[9] E. Candes and T. Tao, "Decoding by linear programming," IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203^215, Dec. 2005. 
[10] E. Candes, "The restricted isometry property and its implications for compressed sensing," Comptes rendus-Math^matique, vol. 346, no. 9, pp. 589-592, 
2008. 

[11] D. Needell and J. Tropp, "CoSaMP: iterative signal recovery from incomplete and inaccurate samples," Appl. Comput. Harmon. Anal, vol. 26, no. 3, 
pp. 301-321, 2009. 

[12] W. Dai and O. Milenkovic, "Subspace pursuit for compressive sensing signal reconstruction," IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2230-2249, 
May 2009. 

[13] T. Blumensath and M. Davies, "Iterative hard thresholding for compressed sensing," Appl. Comput. Harmon. Anal, vol. 27, no. 3, pp. 265-274, 2009. 
[14] S. Foucart, "Recovering jointly sparse vectors via hard thresholding pursuit," in Proc. SampTA 20II, Singapore, 2011. 



56 



[15] D. Donoho and J. Tanner, "Precise undersampling theorems," Proc. IEEE, vol. 98, no. 6, pp. 913-924, 2010. 

[16] M. Wainwright, "Sharp thresholds for high-dimensional and noisy sparsity recovery using -constrained quadratic progranuning (lasso)," IEEE Trans. 

Inf. Theory, vol. 55, no. 5, pp. 2183-2202, May 2009. 
[17] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, "A simple proof of the restricted isometry property for random matrices," Constr. Approx., 

vol. 28, no. 3, pp. 253-263, 2008. 

[18] M. Rudelson and R. Vershynin, "On sparse reconstruction from Fourier and Gaussian measurements," Comm. Pure Appl. Math., vol. 61, no. 8, pp. 
1025-1045, 2008. 

[19] H. Rauhut, "Stability results for random sampling of sparse trigonometric polynomials," IEEE Trams. Inf. Theory, vol. 54, no. 12, pp. 5661-5670, 2008. 
[20] , "Compressive sensing and structured random matrices," in Theoretical Foundations and Numerical Methods for Sparse Recovery, ser. Radon Series 

Comp. Appl. Math., M. Fomasier, Ed. Berlin, Germany: deGruyter, 2010, vol. 9, pp. 1-92. 
[21] H. Rauhut, K. Schnass, and R Vandergheynst, "Compressed sensing and redundant dictionaries," IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 2210-2219, 

2008. 

[22] F. Krahmer and R. Ward, "New and improved Johnson-Lindenstrauss embeddings via the restricted isometry property," SIAM J. Math. Anal, vol. 43, 
p. 1269, 2011. 

[23] E. Candes, Y. Eldar, D. Needell, and R Randall, "Compressed sensing with coherent and redundant dictionaries," Appl. Comput. Harmon. Anal, vol. 31, 
no. 1, pp. 59-73, 2011. 

[24] M. Lustig, D. Donoho, and J. Pauly, "Sparse MRI: the appUcation of compressed sensing for rapid MR imaging," Magnet. Reson. Med., vol. 58, no. 6, 
pp. 1182-1195, 2007. 

[25] O. Christensen, An Introduction to Frames and Riesz Bases. Boston, MA: Birkhauser Boston, 2003. 

[26] P. Feng and Y. Bresler, "Spectrum-blind minimum-rate sampling and reconstruction of multiband signals," in Proc. ICASSP, vol. 3, Atlanta, GA, May 

1996, pp. 1688-1691. 

[27] M. Mishali and Y. Eldar, "Blind multiband signal reconstruction: compressed sensing for analog signals," IEEE Trans. Signal Process., vol. 57, no. 3, 
pp. 993-1009, Mar. 2009. 

[28] Y. Bresler, "Spectrum-bUnd sampling and compressive sensing for continuous-index signals," in Information Theory and Applications Workshop, 2008, 
Feb. 2008, pp. 547-554. 

[29] Y. Eldar, "Compressed sensing of analog signals in shift-invariant spaces," IEEE Trans. Signal Process., vol. 57, no. 8, pp. 2986-2997, 2009. 
[30] M. Mishali and Y. Eldar, "From theory to practice: Sub-Nyquist sampling of sparse wideband analog signals," IEEE J. Sel Topics Signal Process., 
vol. 4, no. 2, pp. 375-391, Feb. 2010. 

[31] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk, "Beyond Nyquist: efficient sampling of sparse bandlimited signals," IEEE Trans. Inf. 

Theory, vol. 56, no. 1, pp. 520-544, Jan. 2010. 
[32] M. Mishali, Y. Eldar, and A. Elron, "XampUng: signal acquisition and processing in union of subspaces," IEEE Trans. Signal Process., vol. 59, no. 10, 

pp. 4719^734, Oct. 2011. 

[33] Y. Bresler and P. Feng, "Spectrum-blind minimum-rate sampling and reconstruction of 2D multiband signals," in Proc. ICIP, vol. 1 , Lausanne, Switzerland, 
Sept. 1996, pp. 701-704. 

[34] J. Ye, Y. Bresler, and P. Moulin, "A self-referencing level-set method for image reconstruction from sparse Fourier samples," Int. J. Comput. Vision, 
vol. 50, no. 3, pp. 253-270, 2002. 

[35] J. Provost and F. Lesage, "The application of compressed sensing for photo-acoustic tomography," IEEE Trans. Med. Imag., vol. 28, no. 4, pp. 585-594, 

2009. 

[36] M. Herman and T. Strohmer, "High-resolution radar via compressed sensing," IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2275—2284, 2009. 
[37] R. Baraniuk and P. Steeghs, "Compressive radar imaging," in Radar Conference, 2007 IEEE, Apr 2007, pp. 128-133. 

[38] L. Potter, E. Ertin, J. Parker, and M. Cetin, "Sparsity and compressed sensing in radar imaging," Proc. IEEE, vol. 98, no. 6, pp. 1006-1020, 2010. 
[39] J. Bobin, J.-L. Starck, and R. Ottensamer, "Compressed sensing in astronomy," IEEE J. Sel Topics Signal Process., vol. 2, no. 5, pp. 718-726, Oct. 
2008. 

[40] E. Candes and Y. Plan, "A probabilistic and RIPless theory of compressed sensing," IEEE Trans. Inf. Theory, vol. 57, no. 1 1, pp. 7235-7254, Nov. 201 1. 
[41] O. Lee, J. Kim, Y. Bresler, and J. Ye, "Compressive diffuse optical tomography: Noniterative exact reconstruction using joint sparsity," IEEE Trans. 
Med. Imag., vol. 30, no. 5, pp. 1129-1142, May 2011. 



57 



[42] K. Schnass and P. Vandergheynst, "Dictionary preconditioning for greedy algoritlims," IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1994-2002, 2008. 
[43] J. Fessler and B. Sutton, "Nonuniform fast fourier transforms using min-max interpolation," IEEE Trans. Signal Process., vol. 51, no. 2, pp. 560-574, 
2003. 

[44] P. Vaidyanathan, Multirate Systems and Filter Banks. Upper Saddle River, NJ: Prentice Hall, 1993. 

[45] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, "Model-based compressive sensing," IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1982-2001, Apr. 
2010. 

[46] M. Davenport and M. Wakin, "Analysis of orthogonal matching pursuit using the restricted isometry property," IEEE Trans. Inf. Theory, vol. 56, no. 9, 
pp. 4395^401, Sept. 2010. 

[47] K. Lee, Y. Bresler, and M. Junge, "Subspace methods for joint sparse recovery," Arxiv preprint 'arXiv: 1004.307 1 2011. 

[48] Y. Eldar, "Sampling with arbitrary sampling and reconstruction spaces and oblique dual frame vectors," / Fourier Anal. AppL, vol. 9, no. 1, pp. 77-96, 
2003. 

[49] V. Cevher and S. Jafarpour, "Fast hard thresholding with Nesterov's gradient method," in NIPS Workshop on Practical Applications of Sparse Modeling, 
2010. 

[50] D. Needell and J. Tropp, "CoSaMP: iterative signal recovery from incomplete and inaccurate samples," ACM Report 2008-01. Caltech, Mar 2008. 
Revised Jul. 2008. 

[51] S. Foucart, "Sparse recovery algorithms: sufficient conditions in terms of restricted isometry constants," in Proceedings of the 13th International 

Conference on Approximation Theory, San Antonio, TX, 2010. 
[52] S. Ravishankar and Y. Bresler, "Learning sparsifying transforms for signal and image representation," submitted for publication. 2012. 
[53] S. Becker, J. Bobin, and E. Candes, "NESTA: a fast and accurate first-order method for sparse recovery," SIAM J. Imaging Sci., vol. 4, p. 1, 2011. 
[54] S. Ravishankar and Y. Bresler, "MR image reconstruction from highly undersampled fe-space data by dictionary learning," IEEE Trans. Med. Imag., 

vol. 30, no. 5, pp. 1028-1041, 2011. 

[55] V. Paulsen, B. Bollobas, W. Fulton, A. Katok, F. Kirwan, and P. Sarnak, Completely Bounded Maps and Operator Algebras. Cambridge, England: 

Cambridge Univ. Press, 2002, vol. 78. 
[56] R. Smith, "Some interlacing properties of the Schur complement of a Hermitian matrix," Linear Algebra AppL, vol. 177, pp. 137-144, 1992. 
[57] R. Bhatia, Matrix Analysis. New York, NY: Springer, 1997. 

[58] I. C. F. Ipsen and C. D. Meyer, "The angle between complementary subspaces," Amer. Math. Monthly, vol. 102, no. 10, pp. 904-911, 1995. 
[59] M. Ledoux and M. Talagrand, Probability in Banach Spaces: Isoperimetry and Processes. New York, NY: Springer, 1991. 
[60] A. Joseph, A. Melnikov, and R. Rentschler, Studies in Memory of Issai Schur. Boston, MA: Birkhauser Boston, 2003, vol. 210. 



